Month: October 2016

CartoDB Reflection

Once again, the timing of HIST680 is impeccable. I had just finished reviewing CartoDB when I went to my mailbox and pulled out this month’s Perspectives published by the AHA. The topic of one of the feature articles? You guessed it: digital mapping.

This simply reinforces my belief that taking this course and participating in the DH Certificate Program through GMU was not only a good decision, but a great one. Now onto my review….

CartoDB (created by Vizzuality) is an open-source, online, cloud-based software system that is sure to please anyone seeking to visualize and store data using geospatial mapping. Basic usage is free with an account; however, better and expanded options are available with a paid subscription. The company also provides support and custom mapping for an additional fee. The free account is accompanied by 50mb of storage, and data can be collected and directly uploaded from the web and accessed via desktop, laptop, tablet, or smart phone. Part of what makes CartoDB so intuitive is its user-friendly interface. Users can upload files with a simple URL cut/paste or file drag/drop. The program also accepts many geospatial formats, such as excel, text files, GPX, and other types of shapefiles, making CartoDB useful for humanities and STEM-related disciplines alike. Once multiple data layers are uploaded users can create a visualization and manipulate this visualization through several modes: heat, cluster, torque, bubble, simple, and others. Once the visualizations have been organized and customized, CartoDB also provides convenient options to provide links and embed codes to share the map. Finally, CartoDB does a great job answering questions with online tutorials, FAQs, and “tips and tricks.” Google maps first ventured into web-based mapping tools, but CartoDB takes it to a whole new level.

Our activity involved using data from the WPA Slave Narratives, and it was a great hands-on exercise to discern the types of information and conclusions that can be drawn by viewing information geospatially. By visualizing the location of interviews it works much like Photogrammar (Module 8), which allows users (teachers and students alike) to see several patterns: travel, chronological, and the geographical concentration of interviews in particular areas of Alabama.

While our class activity provided the data, I am anxious to experiment with data that I have collected myself. For example, I am working on images and maps for a recent manuscript, I have the addresses for several colleges and universities in Nashville. I received an email last week from the press that said they were unable to take my historical maps and provided layered data which would show the relationship between the location of institutions of higher education and the geographical trends of urban growth in Nashville from 1865 to 1930. I look forward to using CartoDB in the future.

Tags Alabama, CartoDB, data mapping, geospatial mapping, mapping, Photogrammer, Slave Narratives, Vizzuality, WPA

Uncategorized

Voyant Reflection

This module about data and text mining and analysis is not only relevant but timely. Just yesterday as I was working with Voyant and exploring data projects such as “Robots Reading Vogue,” I saw this in my news feed. This Bloomberg article provides a visual representation of this year’s presidential debate with word analyses based on big data:
http://www.bloomberg.com/politics/articles/2016-10-19/what-debate-transcripts-reveal-about-trump-and-clinton-s-final-war-of-words?bpolANews=true

I think Voyant is one of the coolest and most useful tools I’ve ever used. That said, the web-version is very glitchy. Attempting to get key words to show for different states and to export the correct link that matched the correct visual took over four hours. Also if I stepped away from my computer for any length of time, I had to start over with stop words, filters, etc. In order to get the desired export visual links, I found it easier to reload individual documents (for states) into Voyant, and I hope the activity links I entered do in fact represent the differentiation I was seeking as I followed the activity directions. I would not use this with my students until I could work out the kinks and had fully tested the documents to be used in class. As an educator, I know all too well from experience that if something can go wrong with software or web-based applications when working with students, it usually does. That said, I have downloaded a version of it to my computer and hope this will make Voyant more user-friendly and maximize utility for data analysis.

Despite technical difficulties, this tool (Voyant) allows users to mine and assess enormous amounts of data in many different ways. To have such a tool is an incredible gift for both teachers and students. You can visualize word usage with word clouds, links to other words, graphically chart the use of key words across a corpus or within a document, view and connect word use within context and within a range from 10 words to full-text.

New users should:

Open http://voyant-tools.org/
Paste url text or upload document and generate text data
Manipulate “stop words” to appropriately cull key words
Compare/contrast key words in different documents as well as across the entire corpus
Study and analyze key words using word cirrus, trends, reader, summary, and contexts
Draw conclusions

Trends: Frequency of “Mother” in Georgia WPA Slave Narratives

Trends: Frequency of “Mother” in North Carolina WPA Slave Narratives

Tags cirrus, data, data mining, Georgia, North Carolina, Slave Narratives, text analysis, text mining, Voyant, word cloud, WPA

Uncategorized

“Digitizing My Kitchen” Exhibit

Post author By admin
Post date October 6, 2016
No Comments on “Digitizing My Kitchen” Exhibit

Using Omeka, I created this practice exhibit.

http://drpethel.com/Omeka/exhibits/show/digitizing

Despite my best efforts, the thumbnails and initial record show images rotated 90 degrees left. If users click on the image, however, it will open in a new window properly rotated.

Tags exhibit, Omeka

Uncategorized

Metadata Review for American Consumer Culture

Post author By admin
Post date October 4, 2016
No Comments on Metadata Review for American Consumer Culture

American Consumer Culture homepage
*Copyright information bottom of post

One of the most engaging, comprehensive, and unique databases I have recently discovered is called American Consumer Culture: Market Research and American Business. This database provides insight into the world of buying, selling and advertising from 1935 to 1965 at a pivotal point in American production, consumption, and media/technology. The collection provides access to thousands of market research reports by pioneering analyst Ernest Dichter who founded the Institute for Motivational Research (1946). In contrast to other advertising experts and market analysts post World War II, “Dichter’s techniques were largely qualitative, focusing on depth interviews and projective tests rather than simple surveys” (“Nature and Scope”). Types of sources included American Consumer Culture are either graphic still images or text and include: memoranda, reports, advertisements, and other industry or business-related documents. Advanced searches have Boolean, primary/secondary source, and (corporate) brand filters.

The search process and metadata mining is quite impressive allowing the user to ask and answer questions based on a variety of searchable fields including author, date, document type, keyword, These fields are also cross-referenced chronologically and thematically with additional components of the database: a comprehensive timeline and thirty-one thematic collections organized within the larger structural framwork (ex. retail and wholesale). Each thematic collection includes an introduction, description, and examples. (See: Industries). There are a few cracks in their metadata search engine, for example, it is difficult to determine where and how many of these documents were used. The use and audience of advertisements is quite simple, but for the many documents (reports, studies, memos), one wonders: Who was the audience and how did that affect and shape the conclusions drawn and arguments presented.

Within the record of the digital object, American Consumer Culture: Market Research and American Business continues to impress. Here is an example for a document entitled “The A-B-C of humor in advertising” — a 1967 report published by Leo Burnett Company, Inc. Click on the image to enlarge.

This search result, and the metadata included, is a great model for creating clear and consistent “data about data.” It describes several of the documents features including physical location of the original (box #, report #), holding library or institution, language, related document info link, date, and copyright. In terms of the original document, additional information is provided: document type, industry, commissioned by (original producer), conducted by (consulting firm), location of consulting firm, method of consultation (ex. test, survey), and keywords. All of these categories work with controlled vocabulary–a key component in creating “successful” metadata. There are also links to their controlled vocabulary glossary and a link to relevant chronology.

As for the features of the digital objects described by metadata, there are options to download as PDF, pages can be viewed in full page or thumbnail view. The document is also keyword searchable and offers an export/citation option. Features not describe by metadata are the scanning specification, scan technician, application, pixels, dpi, and other metadata related to the actual digitization process. Some of this information can be attained by right-clicking the “properties” of the document once downloaded but are not available from the database itself.

American Consumer Culture is a great example of the overlap between definitions that both compete and complement (and heavily discussed in our readings): project, collection, database, and digital thematic research. In the end, regardless of categorization, American Consumer Culture epitomizes “the closest thing that we have in the humanities to a laboratory,” as Kenneth Price argued.

*Copyright information listed on the use of images or text accessed through American Consumer Culture: This selection of images is protected by copyright, and duplication or sale of all or part of the image selection is not permitted, except that the images may be duplicated by you for your own research or other approved purpose either as prints or by downloading. Such prints or downloaded records may not be offered, whether for sale or otherwise, to anyone who is not a member of staff of the publisher. You are not permitted to alter in any way downloaded records without prior permission from the copyright owner. Such permission shall not be unreasonably withheld.

Tags advertising, American Consumer Culture, database, Digital Humanities, Digital Image, documents, HIST680, keyword search, metadata, project, thematic research collection

Uncategorized

Database Review

At first glance, American Poetry might not catch your eye or seem overly impressive. However, scratch beneath the surface of its simplistic homepage and users will find over 40,000 poems by more than 200 American poets from the colonial period to the early twentieth century. It is also connected to African American, Canadian, and British poetry and literature. The database is hosted and published by ProQuest by way of its humanities published imprint of Chadwyck-Healey. A digital publishing specialist, Chadwyck-Healey is “synonymous with innovation in electronic publishing since the release of the English Poetry Full-Text Database in 1992” (“About Chadwyck-Healey”).

The database American Poetry first debuted in 1996 and offers multiple search options, which include keyword, first line/title, and poet/author. For any of these options there is a metadata search index generated by the database that offers a list of searchable terms found within the collection. If one is researching a specific poet then there are additional search fields where results can be mined by gender, ethnicity, literary period, and years lived. Ethnicity and literary period also have indexes available to help users find and select appropriate terms recognized by the database. There are also collections linked on another page that are cross-searchable via the Literature Online interface. Some samples of these collections include African Writers Series, Twentieth-Century Drama, and an upgraded edition of the King James Bible online. The governance of this literature and poetry collection falls under a special selected editorial board. Board members advise on the selection of text and editions with the goals of comprehensiveness and inclusiveness.

After performing a search, using their easily navigable search options, and selecting an individual work, there is a great deal of information provided by American Poetry in regard to the literary period and author. For each poem or work of literature, there is a link with information about the author: gender, birth/death dates, ethnicity, nationality, and literary period. For the poem itself, there is full-text but it is transcribed right onto the webpage and the original is not viewable. While those seeking the text alone (and its legibility) will be satisfied, it leaves a bit to be desired for the historian or digital humanist who wonders what was lost through digitization. There is no exportable image, and searching full-text within the text can only be done using Contol+F as you can on any webpage. There are options for “Print View,” “Download Citation” and “Text Only.”

Surprisingly, the “Download Citation” option is clunky compared to the database’s overall streamlined organization and presentation format. The necessary information is there, but the export and formatting options required additional steps. Rather than go through this process, users would be better off typing up the citation the old-fashioned way—formatted and entered manually in a document. There is also a “Durable URL” option but it simply provides a link that can be saved or emailed. Emailing the link to someone who does not have access to the database will not be able to view your sent data without signing in with a user name and password. However, this feature can help to generate a quick link list for the researcher.

Chadwyck-Healey first began publishing in 1973, and has spent over £50 million over the last decade. Their bibliographic basis is the Bibliography of American Literature (Yale University Press, 1955-1991) and supplemented with additional poets recommended by the Editorial Board to “provide a thorough representation.” Text conversion was processed through four stages: selection of texts, encoding and indexing, re-keying and scanning, and preservation. The selection of text involved a consortium of scholars, research libraries, national libraries, and a publishing team. The encoding method was Standard Generalised Mark-Up Language (SGML). As stated, “SGML encoding of original texts allows works to be divided into content elements . . . and recognized accordingly that provides a route through vast amounts of data” (“Text Conversion”). The re-keying and scanning process took SGML and compared it to text generated by Optical Character Recognition (OCR). Re-keying primarily rectifies spelling and punctuation discrepancies. During the digitization process, the entire text of each poem was included as well as any accompanying text “written by the poet and forming an integral part of the poem,” (“About American Poetry”). This allows for preservation of materials.

Access to the collection follows a strict subscription-only policy; however, it can be accessed remotely. While most databases are primarily operated remotely, this designation shows the age of the database a bit—harkening back to the days of library-only or on-campus databases. There are also some other options that show the age of the database including notes on how to navigate JavaScript, which internet browser to use (Internet Explorer listed), 18 different step-by-step sample searches, changing system color (for user preference), shortcut key to navigate the site “without using a mouse.” In today’s touchpad, cloud-based world many of these features are antiquated as students and faculty alike are more sophisticated and search-savvy.

American Poetry remains an early model of early digitized databases—designed with students and educators (and paid subscriptions) in mind. The publisher, Chadwyck-Healey, boasts that is it used by “specialist researchers to undergraduates alike” and that its full-text primary source materials “create fresh avenues for critical debate, scholarly dialogue, and serendipitous discovery.” While this claim may be a bit far-fetched, this digital collection does contribute and make available a vast amount of poetry and literature related to “America” and mother “Britain,” to the digital world. For this reason, American Poetry is still very much worth the price of an institutional subscription.

Tags American Poetry, Chadwyck-Healey, database, Digital Humanities, Digitization, Digitized content, Encoding, Full-text, HIST680, JavaScript, OCR, Optical Character Recognition, Poetry, Poetry collections, SGML, Standard Generalised Mark-Up Language