Week 9: Markup, Mining & Textual AnalysisClose

Tuesday, November 1st


Overview: How are documents encoded in order to be machine-readable? In the humanities, why is this done? Regardless of whether most people notice them, what are some everyday examples of markup? What encoding guidelines or standards might be relevant to projects in this class? And what are some approaches to reading (or “not reading”) large sums of text at once? How might analyzing a large sum of text brush against (if at all) investments in markup?


Reading Due: (1) “A Pleasant Little Chat about XML” by Julie Meloni, and (2)  “XHTML & CSS” by Stewart Arneil and Greg Newton


Suggested Reading: (1) HTML Dog, (2) “Dive into HTML5” by Mark Pilgrim, (3) “Text Encoding” by Allen H. Renear, (4) “What is Text Analysis, Really?” by Geoffrey Rockwell, (5) “Going Electronic” (part of the Scholarly Introduction to Orlando: Women’s Writing in the British Isles from the Beginnings to the Present) by Susan Brown, Patricia Clements, and Isobel Grundy, (6) the Text Encoding Initiative, (7) “Working with APIs” by Julie Meloni, and (8) “Principles of Voyeur” and “Quick Guide of Voyeur” by Stefan Sinclair


Assignment Due: None. Take a break, people.


Outcomes: Survey a few free text editors suitable for doing markup. Review how XHTML and CSS function in your cluster blogs. Encode and validate some evidence from your project in KML (an XML notation for expressing geographic annotation). Experiment with TAPoR and Many Eyes, using them to analyze your cluster’s blog and/or the evidence you’ve gathered this term.


Friday, November 4th


Overview: What does it mean to mine a digital archive? Under what circumstances would mining be effective or even necessary? What kinds of interpretations does it enable? What does it foreclose? What do we risk when we render data mining objective? Or data “raw”? What can we learn from how mining or “culturomics” projects are conducted, organized, and presented (including how the humanities, or humanities scholars, are involved in the collaborative process)?


Reading Due: (1) “From Babel to Knowledge: Data Mining Large Digital Collections” by Dan Cohen, and (2) “Culturomics?” by Matt Thompson


Viewing Due: “TEDxBoston – Erez Lieberman Aiden & Jean-Baptiste Michel – A Picture is Worth 500 Billion Words”



Suggested Reading: “Counting on Google Books” by Geoffrey Numberg


Assignment Due: Blog Entry #5 (question circulated during class on Tuesday, November 1st). Please also come to class with one particular question about Cohen’s article or the TEDx video.


Outcomes: Collectively experiment with Google Books Ngram Viewer (including the about page). Determine differences and intersections between distant reading, text analysis, data mining, and culturomics. Identify why these differences and intersections matter.


Index image care of Wikipedia.



Huma 150 @ UVic
Built on W—Portfolio