T Document Engineering
Semester: Fall 2020
Semestre d’automne 2020:
Le cours se tiendra en ligne.
Note: Au semestre d’automne 2020, ce cours fait exceptionnellement partie du cursus du master en humanités numérique (MA HN) ainsi que de celui du bachelor en ISH (BA ISH). Pour les étudiants du BA ISH, seule la partie pratique (ci-dessous nommée « Part B ») est obligatoire.
Objective
The course consists of two parts, a seminar-style theoretical part (Part A, Thursday sessions) and a practical part (Part B, Friday sessions). Consequently, the course also has two sets of objectives.
Concerning Part A, the theoretical part, at the end of this course you should:
- have an understanding of the historical development of document engineering and be able to reflect upon document technology in a historical perspective
- have an understanding of the larger context of document technologies
- be able to read scholarly publications in the field
- be able to summarize your reading, to draw connections between historical and current developments, and to express your ideas in writing
Concerning Part B, the practical part, at the end of this course you should be able to:
- create academic documents using LaTeX
- correctly apply the basic rules of French, English, and German typography
- use the UNIX shell and the standard UNIX text processing tools
- use the Emacs text editor
- use regular expressions
- understand the basic notions of text encoding, notably Unicode
- understand the fundamental concepts of XML
- create and edit XML documents
- create XSLT stylsheets for transforming XML documents
- combine different tools to effectively process plain text and structured documents
Content (IN PROGRESS)
Part A
potential section topics:
- UNIX
- Emacs
- From typewriters to word processors
- From Gutenberg to the Linotype
- Editing text on the computer: the challenge of interactive computing
- WYSIWYG: from Bravo to Word
- Computer typesetting: runoff, troff, TeX, PostScript, etc.
- Generic markup, SGML, XML
- Writing systems
- Hypertext: NLS, Xanadu, etc.
- Programming languages for text processing
- Document engineering and DH
Part B
Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents. (↗ The ACM Symposium on Document Engineering)
As name of this course, "document engineering" is admittedly a bit pretentious: we will not engineer new systems, but we will rather use existing tools. Nevertheless, it does touch on a number of important basic concepts in document engineering.
I aim to cover the following aspects:
- Using LaTeX to create academic documents (e.g., seminar papers or your thesis).
- Using the UNIX command line for manipulating files and for basic text processing.
- Using the Emacs text editor for interactive manipulation of documents, including the use of regular expressions.
- XML for representing structured documents.
- Using XSLT to transform XML documents.
The goal is to enable you to transform plain text into XML---which is the basis for advanced processing, including the transformation of (parts of) the XML data into something else.
Specifically, you will transform a semi-structured document, such as a play or a dictionary, from plain text to an HTML presentation.