TDocument Engineering

Semester:Fall 2020

Semestre d’automne 2020:
Le cours se tiendra en ligne.

Note: Au semestre d’automne 2020, ce cours fait exceptionnellement partie du cursus du master en humanités numérique (MA HN) ainsi que de celui du bachelor en ISH (BA ISH). Pour les étudiants du BA ISH, seule la partie pratique (ci-dessous nommée « Part B ») est obligatoire.


The course consists of two parts, a seminar-style theoretical part (Part A, Thursday sessions) and a practical part (Part B, Friday sessions). Consequently, the course also has two sets of objectives.

Concerning Part A, the theoretical part, at the end of this course you should:

Concerning Part B, the practical part, at the end of this course you should be able to:


Part A

potential section topics:

  1. UNIX
  2. Emacs
  3. From typewriters to word processors
  4. From Gutenberg to the Linotype
  5. Editing text on the computer: the challenge of interactive computing
  6. WYSIWYG: from Bravo to Word
  7. Computer typesetting: runoff, troff, TeX, PostScript, etc.
  8. Generic markup, SGML, XML
  9. Writing systems
  10. Hypertext: NLS, Xanadu, etc.
  11. Programming languages for text processing
  12. Document engineering and DH

Part B

Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents. (↗ The ACM Symposium on Document Engineering)

As name of this course, "document engineering" is admittedly a bit pretentious: we will not engineer new systems, but we will rather use existing tools. Nevertheless, it does touch on a number of important basic concepts in document engineering.

I aim to cover the following aspects:

The goal is to enable you to transform plain text into XML---which is the basis for advanced processing, including the transformation of (parts of) the XML data into something else.

Specifically, you will transform a semi-structured document, such as a play or a dictionary, from plain text to an HTML presentation.