Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2015-06-30T07:51:03+0000
The one-day Computer-supported collation with CollateX workshop is part of DH2015: Global digital humanities, the annual international conference of the Alliance of Digital Humanities Organizations (ADHO) hosted by the University of Western Sydney from 2015-06-03 through 2015-06-03 July. The workshop takes place on Monday, 2015-06-29 in Building EA Room 109 at the University of Western Sydney (Parramatta South Campus) from 9:30–4:30, with a lunch break (on your own) from 12:30–1:30.
This workshop will teach participants how to use the open-source CollateX collation tool to compare witnesses of a text automatically, in a way that can be used to produce critical textual editions and other types of comparative documents. Participants will learn how to prepare source materials in any written script for collation, how to perform automated collation using CollateX, and how to inspect and modify the results.
Participants must bring their own laptops and must install Python 3 and CollateX in preparation for the workshop (see the links to installation instructions below). No prior Python programming experience is required.
9:30–10:10 | Unit 1: Theory of collation; Gothenburg model; automated collation (local copy ) |
10:10–10:45 | Unit 2: CollateX environment: IPython, the command line, the hierarchical file system |
10:45–11:15 | Coffee break (catered, EA building foyer) |
11:15–11:50 | Unit 3: Witnesses, tokens, tokenization |
11:50–12:30 | Unit 4: Collating plain text, output options (alignment table, variant graph, TEI parallel segmentation, critical apparatus) |
12:30–1:30 | Lunch break (catered, EA building foyer) |
1:30–2:10 | Unit 5: Using CollateX with XML: recognizing and tracking markup information during collation |
2:10–2:45 | Unit 6: Refining the collation: normalization |
2:45–3:15 | Coffee break (catered, EA building foyer) |
3:15–3:50 | Unit 7: Processing tokens differently according to markup information |
3:50–4:30 | Unit 8: Applications of CollateX: workshop project showcase |
If you have already installed CollateX, make sure that you have the most recent version by running:
pip install --pre --upgrade collatex
Python-Levenshtein binaries for Microsoft Windows. Users of Microsoft Windows who are unable to install the Python-Levenshtein library from source can install these precompiled binaries instead (mirrored from http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-levenshtein):
python_Levenshtein‑0.12.0‑cp34‑none‑win32.whl (for 32-bit Windows)
python_Levenshtein‑0.12.0‑cp34‑none‑win_amd64.whl (for 64-bit Windows)
The installation procedure is described in detail in our installation instructions.
2499/data/xmlsubdirectory)
Glossary of CollateX and collation terminology
The Gothenburg model: A modular architecture for computer-aided collation
Computer-supported collation of modern manuscripts: CollateX and the
Beckett Digital Manuscript Project.
Ronald Haentjens Dekker,
Dirk van Hulle, Gregor Middell, Vincent Neyt, and Joris van Zundert.
Literary and linguistics computing: the journal of digital scholarship
in the humanities, Vol. 25 (2014-03-19): 1–19.
Acknowledgements: USB sticks generously contributed by eXist-db. Koala-keyboard image from Buzzfeed.