MSc · Elective — Specialization

Natural Language Processing
for the Byzantine Corpus

Postgraduate Programme — Level 7

10 ECTS Credits
3 hrs / week
English Language
Open to Erasmus
Basic Python programming Prerequisites

LEARNING OUTCOMES

Upon completion, students will be able to:

1)

Explain core NLP concepts (tokenization, lemmatization, named entity recognition, text classification, topic modelling) and their relevance to Byzantine philology and historical research.

2)

Use Python and standard NLP libraries (CLTK, spaCy, NLTK, pandas) to load, preprocess, and analyse Byzantine Greek texts from major digital repositories.

3)

Design and apply annotation guidelines for Byzantine named entities (persons, offices, toponyms, dynasties, dates) and produce annotated datasets using Prodigy or Label Studio.

4)

Train, evaluate, and critically interpret a domain-specific NER model using precision, recall, and F1 metrics.

5)

Apply stylometric and topic modelling methods to Byzantine corpora and interpret results within their historical and philological context.

6)

Critically evaluate the possibilities and limitations of large language models and AI tools when applied to Medieval Greek texts.

7)

Design and deliver an end-to-end NLP pipeline project on a Byzantine text, integrating philological expertise with computational methods.

COURSE SYLLABUS

13 Modules

Week 01 | Working with text in Python

Digital resources for Byzantine texts.

Loading Byzantine texts, basic string operations.

Tokenization, normalization, polytonic Unicode, scribal abbreviations. Tools: NLTK, spaCy, CLTK.

Challenges of Greek inflection; evaluating CLTK on Byzantine texts.

Word frequency, Zipf’s Law, authorship analysis applied to Byzantine texts.

Entity types in Byzantine texts; why off-the-shelf NER fails; IOB tagging scheme.

Designing annotation guidelines for Byzantine entities.

Evaluation metrics: precision, recall, F1. Error analysis.

Genre classification. Critical interpretation.

Entity linking to PBW, Pleiades, Wikidata; introduction to RDF triples.

Prompting strategies; critical evaluation of LLM performance on Medieval Greek; hallucination in historical contexts.

Student presentations of NER pipeline progress; peer feedback; troubleshooting.

Closing discussion: computational text analysis in Byzantine scholarship.

ASSESSMENT

Student Evaluation

40%

Weekly exercises

formative; submitted via course platform; graded on correctness, code quality, and critical reflection

60%

Final Project
summative; NER pipeline on a Byzantine text of the student’s choosing, including annotation manual, trained model, evaluation report, and public presentation

Workload — ECTS Distribution

250 Hours Total

Lectures

39

Weekly exercises

91

Final project and presentation

120

Course Total

250

Recommended Bibliography

Suggested bibliography:

  • Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing with Python. O’Reilly.

  • Jurafsky, D., & Martin, J. H. (2026). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models (3rd ed.). https://web.stanford.edu/~jurafsky/slp3/

Scroll to Top