From Texts to Networks
Worshop in Network Analysis
Maciej Eder
University of Tartu | Polish Academy of Sciences
06/05/2026
Introduction
Aims of the study
Assessing large number of texts
Leveraging existing explanatory methods
Automatic analysis of textual relations…
… that will reveal groups of similar texts
Mapping literature
Chronology
Genre
Imitation
Authorship
…
Basic concepts
“Computation into criticism” (John Burrows)
“Distant reading” (Franco Moretti)
“Macroanalysis” (Matthew Jockers)
Big Data
Authorship attribution
Cluster analysis dendrograms
Highly dependent on feature selection
Highly dependent on similarity measure
Highly dependent on linkage algorithm
No validation provided
136 most frequent words
137 most frequent words
Unstable results
Noise?
Unreliability of the linkage procedure?
More than one stylistic layer?
Layers
Networks to rescue
A new technique: assumptions
Explanatory power of dendrograms
Stable results and/or validation
Scalability: 500 texts? 1000 texts? 31M texts?
Authorship attribution
An anonymous text is tested against a set of “candidates”
Goal: to find the nearest neighbor
Effective style-marker: most frequent words (grammatical words)
Establishing a distance
Another distance…
Checking each sample
Deciding which distance is shortest
From attribution to stylometry
If it works with anonymous texts, …
… what about extending the same procedure?
What about applying it to
all
texts?
Distances to txt1
Distances to txt2
Distances to txt3
A resulting network
From stylometry to networks
Nearest neighbors can be represented as connected nodes of a network.
A variety of layout algorithms can be applied.
Example
Reliability, stupid!
Instead of one analysis (e.g. 100 MFWs)…
… a whole range (100, 200, 300 MFW, etc.).
Next, particular “snapshots” summarized.
100 MFWs
200 MFWs
300 MFWs
Consensus network
Importance of runners-up
Nearest neighbors = the most similar
Runners-up: do they really deserve to be filtered out?
Solution: more connections!
3 neighbors to txt1
3 neighbors to txt2
3 neighbors to txt3
A resulting network
Two algorithms togegher
Latin literature at a glance
Traces of chronology?
“Ciceronianus es, non Christianus!” (God accusing St Jerome)
“Renovatio antiquitatis” (Renaissance humanists)
hypothesis: noticeable traces of chronology
How influential was Cicero?
Ciceronian Quarrel: the single most important debate of the Renaissance.
Imitation of the admirable Ciceronian style.
hypothesis: visible traces of Cicero in early modern Latin writings.
Conclusions
No clear chronological pattern.
Genre signal: predominant.
Masters of style tend to keep outside the network.
Further research
More texts!
More genres!
More “Attic” writers (e.g. Lipsius).