Computational
Literary Studies
Infrastructure

challenges in the post-COVID era

Maciej Eder (maciej.eder@ijp.pan.pl)

10.07.2023


TwinTalks 4
Digital Humanities 2023, Graz, 10–16.07.2023

introduction

First, what CLS is about

  • Computational Literary Studies
  • Aimed at analyzing (large amounts of) textual data…
  • … by computational techniques

Leon Battista Alberti

Leon Battista Alberti, De componendis cifris, ca. 1466

Computation into criticism

John Burrows, Computation into Criticism, 1987

Distant reading

Franco Moretti, Matt Jockers, Ted Underwood

Sociology of reading

Karina van Dalen-Oskam, Het raadsel literatuur, 2021

Quantitative linguistics

Foundations of CLS

  • Computation into criticism
  • Distant reading
  • Stylometry
  • Authorship attribution
  • Digital humanities
  • Language resources
  • Digital libraries
  • Natural language processing
  • Machine learning
  • Big data

What CLS has to offer

  • Scientific method
    • reproducibility, empirical paradigm, statistical modeling, probabilistic inference, …
  • Scale
    • access to unprecedented amounts of data
  • Accuracy ability to capture patterns invisible to a naked eye

Blase Pascal and two infinities

Combination of factors needed

  • Datasets (language resources)
  • Tools (computer programs)
  • Suitable methodology
  • Computer power (i.e. scientific instruments)

Not possible individually

Research infrastructures

Libraries, journals, publishers, …

Dictionaries at IJP PAN

ELTeC corpus

DraCor

CLS INFRA

An infrastructural project for computational literary studies, founded by Horizon 2020 scheme

infrastructures in DH and CLS

  • in hard sciences, infrastructures are tangible
    • servers, telescopes, accelerators, …
  • in the humanities, institutions are essential
    • libraries, publishing houses, journals, …
  • in DH, multifaceted needs
    • the notion of infrastructure needs reconsideration
    • corpora (FAIR!) but not only

CLS INFRA project

  • text collections (corpora)
    • quality
    • metadata
    • conversion
  • methodology
    • tools (NLP, datavis, …)
    • tool chains
    • methodological considerations
    • bibliographic survey
  • network of scholars
    • training schools
    • short-term research stays
    • collaboration with COST Action

Overarching idea is to connect…

  • People
    • To establish a network of CLS researchers
  • Data
    • To consolidate existing high-quality corpora…
    • …covering prose, drama and poetry
  • Tools
    • To build a chain of NLP tools to analyze texts
  • Methods
    • To provide a survey of state-of-the-art methods

The project’s structure

The team (30+ people)

Julie Birkholz, Ingo Börner, Joanna Byszuk, Sally Chambers, Vera Maria Charvat, Silvie Cinková, Tess Dejaeghere, Julia Dudar, Matej Ďurčo, Maciej Eder, Jennifer Edmond, Evgeniia Fileva, Frank Fischer, Serge Heiden, Michal Křen, Bartłomiej Kunda, Michał Mrugalski, Ciara Murphy, Carolin Odebrecht, Marco Raciti, Salvador Ros, Christof Schöch, Artjoms Šeļa, Toma Tasovac, Justin Tonra, Erzsébet Tóth-Czifra, Peer Trilcke, Karina van Dalen-Oskam, Lisanne van Rossum

activities

training schools

  • Prague 2022
    • NLP tools
    • 25 participants on site
    • many more remotely
  • Madrid 2023
    • text analysis
    • 10-11 May 2023
  • Vienna 2024
    • corpus queries
    • tentative dates: spring 2024

TNA

  • transnational access
  • short-term research stays…
  • in one of 6 institutions:
    • NUI Galway
    • Uni Potsdam
    • Uni Trier
    • UNED Madrid
    • OEAW Vienna
    • Charles Uni, Prague
  • everyone eligible
  • two calls every year

deliverables

deliverables published

  • 3.1 Report on the methodological baseline for (computational) literary studies
  • 4.1 Report on the skills matrix for computational literary studies
  • 5.1 Review of the data landscape
  • 6.1 Assembly of existing data

survey of methods

post-COVID lessons

gitlab repositories

gitlab issues boards

file storage cloud

chatter (Discord)

chatter (Mattermost)

teleconferencing services

  • jitsi
  • big blue button
  • zoom

6 good habits

1. Work in real time

  • schedule ad-hoc meetings to actually work together
  • use chat and not e-mail to be present in instant conversations

2. Director is visible

  • have a stable and supportive central presence for daily matters
  • schedule informal coffee sessions on Zoom

3. Forge personal bonds

  • talk to each other informally even when the work does not require it
  • … consider this to be an investment

4. Transfer responsibility

  • put trust in executive experts who know the project inside out
  • they can brief their supervisors for targeted strategic involvement

5. Celebrate success

  • sharing enthusiasm about achievements
  • this makes project members feel seen
  • it maintains positive momentum

6. It’s (really) not talent

  • project management is about initiative
  • it is not a skill that some people possess while others do not