The Grundtvig eStudies Pamphlets


Centre for Grundtvig Studies conducts a series of computational explorations of the digital Grundtvig corpus and contextualizing corpora such as the Danish Royal Library’s digital newspaper archive Mediestream (https://www2.statsbiblioteket.dk/mediestream/ )

The conclusions drawn from this work can be found in a growing amount of papers, books, and articles. The Grundtvig eStudies Pamphlets offers the raw data and methodological declarations for each of these studies.

Grundtvig’s collected writings are available in XML (N = 1073) following the TEI guidelines. Currently 42% are richly annotated, but the process of enriching the data is ongoing. The project’s scheduled completion date is in 2029. The data set has a median document size of four pages and contains 3.968.841word tokens distributed over 115.240 word-types. The data are available at: https://github.com/centre-for-humanities-computing/grundtvig-data. The data are made available in this format through an agreement with Grundtvig Study Centre. We have also made a custom XML parser available to facilitate third-party data exploration: https://github.com/centre-for-humanities-computing/GrundtvigParser.