MSc · Elective — Specialization

Natural Language Processing
for the Byzantine Corpus

Postgraduate Programme — Level 7

10 ECTS Credits

3 hrs / week

English Language

Open to Erasmus

Basic Python programming Prerequisites

LEARNING OUTCOMES

Upon completion, students will be able to:

Explain core NLP concepts (tokenization, lemmatization, named entity recognition, text classification, topic modelling) and their relevance to Byzantine philology and historical research.

Use Python and standard NLP libraries (CLTK, spaCy, NLTK, pandas) to load, preprocess, and analyse Byzantine Greek texts from major digital repositories.

Design and apply annotation guidelines for Byzantine named entities (persons, offices, toponyms, dynasties, dates) and produce annotated datasets using Prodigy or Label Studio.

Train, evaluate, and critically interpret a domain-specific NER model using precision, recall, and F1 metrics.

Apply stylometric and topic modelling methods to Byzantine corpora and interpret results within their historical and philological context.

Critically evaluate the possibilities and limitations of large language models and AI tools when applied to Medieval Greek texts.

Design and deliver an end-to-end NLP pipeline project on a Byzantine text, integrating philological expertise with computational methods.

COURSE SYLLABUS

13 Modules

Week 01 | Working with text in Python

Digital resources for Byzantine texts.

Week 02 | Prompt Literacy for Humanities Research

Loading Byzantine texts, basic string operations.

Week 03 | Preprocessing Medieval Greek

Tokenization, normalization, polytonic Unicode, scribal abbreviations. Tools: NLTK, spaCy, CLTK.

Week 04 | Morphological analysis and lemmatization

Challenges of Greek inflection; evaluating CLTK on Byzantine texts.

Week 05 | Corpus statistics and stylometry

Word frequency, Zipf’s Law, authorship analysis applied to Byzantine texts.

Week 06 | Named Entity Recognition Recognition — concepts.

Entity types in Byzantine texts; why off-the-shelf NER fails; IOB tagging scheme.

Week 07 | Annotation practice

Designing annotation guidelines for Byzantine entities.

Week 08 | Training and evaluating a NER model with spaCy

Evaluation metrics: precision, recall, F1. Error analysis.

Week 09 | Text classification and topic modelling

Genre classification. Critical interpretation.

Week 10 | Knowledge graphs and linked data

Entity linking to PBW, Pleiades, Wikidata; introduction to RDF triples.

Week 11 | Large language models and Byzantine texts

Prompting strategies; critical evaluation of LLM performance on Medieval Greek; hallucination in historical contexts.

Week 12 | Project workshop

Student presentations of NER pipeline progress; peer feedback; troubleshooting.

Week 13 | Final presentations

Closing discussion: computational text analysis in Byzantine scholarship.

ASSESSMENT

Student Evaluation

40%

Weekly exercises

formative; submitted via course platform; graded on correctness, code quality, and critical reflection

60%

Final Project
summative; NER pipeline on a Byzantine text of the student’s choosing, including annotation manual, trained model, evaluation report, and public presentation

Workload — ECTS Distribution

250 Hours Total

Lectures

Weekly exercises

Final project and presentation

120

Course Total

250

Recommended Bibliography

Suggested bibliography:

Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing with Python. O’Reilly.
Jurafsky, D., & Martin, J. H. (2026). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models (3rd ed.). https://web.stanford.edu/~jurafsky/slp3/

Natural Language Processing for the Byzantine Corpus