INSTRUCTORS:
This course is an introduction to developing and working with texts electronically, particularly literary and historical language texts. It lays a strong conceptual foundation as well as provides hands-on experience working with texts. Many introductions to text analysis or corpus building focus on texts such as newspapers, product reviews or tweets. This course instead focuses on texts of particular relevance to Signum students (for example, a Norse saga or 19th-century novel) and the fundamentals of preparing and analysing specific literary texts.
The course will provide both a conceptual and practical introduction to topics such as text preparation, copyright, metadata, character encoding, typography, markup, annotation, concordancing, and some basic quantitative analysis. No background in statistics or programming is required although we will touch on those two fields in the context of text analysis. There will be an opportunity for students to work on projects relating to a text of their choice (an out-of-copyright literary text in English or, if the student has an appropriate language background, a text in a historical language). The course will also provide a useful foundation for further study in digital philology, computational literary studies, natural language processing, and quantitative methods in the humanities.
Prerequisite: This course has no prerequisite; however, a student who wants to focus on a historic language text for their project (e.g. an Old Norse saga or Old English poem) should have some familiarity with that language already (either through a Signum course or a comparable course at another institution).
Note on M.A. Degree Requirements: This course does count toward a Master’s student’s language requirement. This course does not automatically count toward any concentration. However, a student may petition to have it count toward a concentration depending on the topic of their project(s) (e.g. a student who uses a historic language text for their projects might count it toward a Germanic Philology concentration). This is subject to approval by the course faculty and the Dean of Language and Literature. The student should contact their advisor to facilitate the petition for approval.
Weekly Schedule
This live course will include two 1-hour lectures and two 1-hour discussion sessions per week as assigned (4 hours total weekly). Remember to indicate your availability on the Goldberry registration system.
Course Schedule
Week 1
- Introduction & History
Week 2
- Text Sources
- Copyright and Metadata
Week 3
- Text Structure and Citation Systems
- Typography and Typefaces
Week 4
- Character Encoding and Unicode
- Markup and Markup Languages
Week 5
- Basic SGML / XML
- HTML and CSS
Week 6
- Version Control and GitHub
- The Text Encoding Initiative (TEI) Part 1
Week 7
- The Text Encoding Initiative (TEI) Part 2
- Search and Information Retrieval
Week 8
- Wildcards, Globs, and Regular Expressions
- Tokens, Types, and Word Frequency
Week 9
- Stemming, Lemmatization, and Part of Speech Tagging
- Concordances and Corpus Linguistics
Week 10
- Probability and Descriptive Statistics
- Inferential Statistics
Week 11
- Introduction to Python
- Text Processing with Python
Week 12
- Natural Language Processing
- Visualization
- Next Steps
Note on the Summer 2022 Schedule: The summer 2022 semester will include a one-week summer break which will fall on the week of June 20, 2022.
Required Texts
There are no required texts to purchase for this course. Required readings will be supplied by the instructor in the final syllabus.
Course History
Semester | Preceptor(s) |
Summer 2022 | James Tauber |