Computer-supported collation with CollateX


[Image of koala on keyboard]

Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 4.0 International License] Last modified: 2015-06-30T07:51:03+0000


Description and goals

The one-day Computer-supported collation with CollateX workshop is part of DH2015: Global digital humanities, the annual international conference of the Alliance of Digital Humanities Organizations (ADHO) hosted by the University of Western Sydney from 2015-06-03 through 2015-06-03 July. The workshop takes place on Monday, 2015-06-29 in Building EA Room 109 at the University of Western Sydney (Parramatta South Campus) from 9:30–4:30, with a lunch break (on your own) from 12:30–1:30.

This workshop will teach participants how to use the open-source CollateX collation tool to compare witnesses of a text automatically, in a way that can be used to produce critical textual editions and other types of comparative documents. Participants will learn how to prepare source materials in any written script for collation, how to perform automated collation using CollateX, and how to inspect and modify the results.

Participants must bring their own laptops and must install Python 3 and CollateX in preparation for the workshop (see the links to installation instructions below). No prior Python programming experience is required.

Instructors

Schedule

9:30–10:10 Unit 1: Theory of collation; Gothenburg model; automated collation [Prezi] (local copy [pdf])
10:10–10:45 Unit 2: CollateX environment: IPython, the command line, the hierarchical file system
10:45–11:15 Coffee break (catered, EA building foyer)
11:15–11:50 Unit 3: Witnesses, tokens, tokenization
11:50–12:30 Unit 4: Collating plain text, output options (alignment table, variant graph, TEI parallel segmentation, critical apparatus)
12:30–1:30 Lunch break (catered, EA building foyer)
1:30–2:10 Unit 5: Using CollateX with XML: recognizing and tracking markup information during collation
2:10–2:45 Unit 6: Refining the collation: normalization
2:45–3:15 Coffee break (catered, EA building foyer)
3:15–3:50 Unit 7: Processing tokens differently according to markup information
3:50–4:30 Unit 8: Applications of CollateX: workshop project showcase

Workshop resources

Installation

Data

Other workshop resources

General CollateX resources


Acknowledgements: USB sticks generously contributed by eXist-db. Koala-keyboard image from Buzzfeed.