CartoDB Reflection

Once again, the timing of HIST680 is impeccable. I had just finished reviewing CartoDB when I went to my mailbox and pulled out this month’s Perspectives published by the AHA. The topic of one of the feature articles? You guessed it: digital mapping.

img_3398

img_3397

This simply reinforces my belief that taking this course and participating in the DH Certificate Program through GMU was not only a good decision, but a great one. Now onto my review….

heat_alabama_interviews_cartodb_1_by_mepethel_10_23_2016_10_16_35

CartoDB (created by Vizzuality) is an open-source, online, cloud-based software system that is sure to please anyone seeking to visualize and store data using geospatial mapping. Basic usage is free with an account; however, better and expanded options are available with a paid subscription. The company also provides support and custom mapping for an additional fee. The free account is accompanied by 50mb of storage, and data can be collected and directly uploaded from the web and accessed via desktop, laptop, tablet, or smart phone. Part of what makes CartoDB so intuitive is its user-friendly interface. Users can upload files with a simple URL cut/paste or file drag/drop. The program also accepts many geospatial formats, such as excel, text files, GPX, and other types of shapefiles, making CartoDB useful for humanities and STEM-related disciplines alike.  Once multiple data layers are uploaded users can create a visualization and manipulate this visualization through several modes: heat, cluster, torque, bubble, simple, and others. Once the visualizations have been organized and customized, CartoDB also provides convenient options to provide links and embed codes to share the map. Finally, CartoDB does a great job answering questions with online tutorials, FAQs, and “tips and tricks.” Google maps first ventured into web-based mapping tools, but CartoDB takes it to a whole new level.

Our activity involved using data from the WPA Slave Narratives, and it was a great hands-on exercise to discern the types of information and conclusions that can be drawn by viewing information geospatially. By visualizing the location of interviews it works much like Photogrammar (Module 8), which allows users (teachers and students alike) to see several patterns: travel, chronological, and the geographical concentration of interviews in particular areas of Alabama.

While our class activity provided the data, I am anxious to experiment with data that I have collected myself. For example, I am working on images and maps for a recent manuscript, I have the addresses for several colleges and universities in Nashville. I received an email last week from the press that said they were unable to take my historical maps and provided layered data which would show the relationship between the location of institutions of higher education and the geographical trends of urban growth in Nashville from 1865 to 1930. I look forward to using CartoDB in the future.

 

 

Voyant Reflection

This module about data and text mining and analysis is not only relevant but timely.  Just yesterday as I was working with Voyant and exploring data projects such as “Robots Reading Vogue,” I saw this in my news feed. This Bloomberg article provides a visual representation of this year’s presidential debate with word analyses based on big data:
http://www.bloomberg.com/politics/articles/2016-10-19/what-debate-transcripts-reveal-about-trump-and-clinton-s-final-war-of-words?bpolANews=true


I think Voyant is one of the coolest and most useful tools I’ve ever used. That said, the web-version is very glitchy. Attempting to get key words to show for different states and to export the correct link that matched the correct visual took over four hours. Also if I stepped away from my computer for any length of time, I had to start over with stop words, filters, etc. In order to get the desired export visual links, I found it easier to reload individual documents (for states) into Voyant, and I hope the activity links I entered do in fact represent the differentiation I was seeking as I followed the activity directions. I would not use this with my students until I could work out the kinks and had fully tested the documents to be used in class. As an educator, I know all too well from experience that if something can go wrong with software or web-based applications when working with students, it usually does. That said, I have downloaded a version of it to my computer and hope this will make Voyant more user-friendly and maximize utility for data analysis.

Despite technical difficulties, this tool (Voyant) allows users to mine and assess enormous amounts of data in many different ways. To have such a tool is an incredible gift for both teachers and students. You can visualize word usage with word clouds, links to other words, graphically chart the use of key words across a corpus or within a document, view and connect word use within context and within a range from 10 words to full-text.

New users should:

  1. Open http://voyant-tools.org/
  2. Paste url text or upload document and generate text data
  3. Manipulate “stop words” to appropriately cull key words
  4. Compare/contrast key words in different documents as well as across the entire corpus
  5. Study and analyze key words using word cirrus, trends, reader, summary, and contexts
  6. Draw conclusions

Trends: Frequency of “Mother” in Georgia WPA Slave Narratives
index_ga

Trends: Frequency of “Mother” in North Carolina WPA Slave Narratives

Database Review

american-poetry

At first glance, American Poetry might not catch your eye or seem overly impressive. However, scratch beneath the surface of its simplistic homepage and users will find over 40,000 poems by more than 200 American poets from the colonial period to the early twentieth century. It is also connected to African American, Canadian, and British poetry and literature. The database is hosted and published by ProQuest by way of its humanities published imprint of Chadwyck-Healey. A digital publishing specialist, Chadwyck-Healey is “synonymous with innovation in electronic publishing since the release of the English Poetry Full-Text Database in 1992” (“About Chadwyck-Healey”).

The database American Poetry first debuted in 1996 and offers multiple search options, which include keyword, first line/title, and poet/author. For any of these options there is a metadata search index generated by the database that offers a list of searchable terms found within the collection. If one is researching a specific poet then there are additional search fields where results can be mined by gender, ethnicity, literary period, and years lived. Ethnicity and literary period also have indexes available to help users find and select appropriate terms recognized by the database. There are also collections linked on another page that are cross-searchable via the Literature Online interface. Some samples of these collections include African Writers Series, Twentieth-Century Drama, and an upgraded edition of the King James Bible online. The governance of this literature and poetry collection falls under a special selected editorial board. Board members advise on the selection of text and editions with the goals of comprehensiveness and inclusiveness.

After performing a search, using their easily navigable search options, and selecting an individual work, there is a great deal of information provided by American Poetry in regard to the literary period and author. For each poem or work of literature, there is a link with information about the author: gender, birth/death dates, ethnicity, nationality, and literary period. For the poem itself, there is full-text but it is transcribed right onto the webpage and the original is not viewable. While those seeking the text alone (and its legibility) will be satisfied, it leaves a bit to be desired for the historian or digital humanist who wonders what was lost through digitization. There is no exportable image, and searching full-text within the text can only be done using Contol+F as you can on any webpage. There are options for “Print View,” “Download Citation” and “Text Only.”

Surprisingly, the “Download Citation” option is clunky compared to the database’s overall streamlined organization and presentation format. The necessary information is there, but the export and formatting options required additional steps. Rather than go through this process, users would be better off typing up the citation the old-fashioned way—formatted and entered manually in a document. There is also a “Durable URL” option but it simply provides a link that can be saved or emailed. Emailing the link to someone who does not have access to the database will not be able to view your sent data without signing in with a user name and password. However, this feature can help to generate a quick link list for the researcher.

Chadwyck-Healey first began publishing in 1973, and has spent over £50 million over the last decade. Their bibliographic basis is the Bibliography of American Literature (Yale University Press, 1955-1991) and supplemented with additional poets recommended by the Editorial Board to “provide a thorough representation.” Text conversion was processed through four stages: selection of texts, encoding and indexing, re-keying and scanning, and preservation. The selection of text involved a consortium of scholars, research libraries, national libraries, and a publishing team. The encoding method was Standard Generalised Mark-Up Language (SGML). As stated, “SGML encoding of original texts allows works to be divided into content elements . . . and recognized accordingly that provides a route through vast amounts of data” (“Text Conversion”). The re-keying and scanning process took SGML and compared it to text generated by Optical Character Recognition (OCR). Re-keying primarily rectifies spelling and punctuation discrepancies. During the digitization process, the entire text of each poem was included as well as any accompanying text “written by the poet and forming an integral part of the poem,” (“About American Poetry”). This allows for preservation of materials.

Access to the collection follows a strict subscription-only policy; however, it can be accessed remotely. While most databases are primarily operated remotely, this designation shows the age of the database a bit—harkening back to the days of library-only or on-campus databases. There are also some other options that show the age of the database including notes on how to navigate JavaScript, which internet browser to use (Internet Explorer listed), 18 different step-by-step sample searches, changing system color (for user preference), shortcut key to navigate the site “without using a mouse.” In today’s touchpad, cloud-based world many of these features are antiquated as students and faculty alike are more sophisticated and search-savvy.

American Poetry remains an early model of early digitized databases—designed with students and educators (and paid subscriptions) in mind. The publisher, Chadwyck-Healey, boasts that is it used by “specialist researchers to undergraduates alike” and that its full-text primary source materials “create fresh avenues for critical debate, scholarly dialogue, and serendipitous discovery.” While this claim may be a bit far-fetched, this digital collection does contribute and make available a vast amount of poetry and literature related to “America” and mother “Britain,” to the digital world. For this reason, American Poetry is still very much worth the price of an institutional subscription.