The model was trained using materials from the collection of the Scientific Commission of the Riga Latvian Society, preserved at the Archives of Latvian Folklore (ALF, Institute of Literature, Folklore and Art, University of Latvia). This is the oldest and most extensive collection at ALF, containing unique manuscripts that cover various folklore genres, ethnographic records, linguistic materials, place-name documentation, songs, riddles, word explanations, and other testimonies of traditional culture, dialects, and Latvian cultural history.
The model was trained using artificial intelligence technologies and previously prepared manuscript transcriptions produced by the ALF volunteer community. The character error rate (CER) of the model published in Transkribus is 4.83%; it was trained on 2,671 pages, covering more than 367,000 words and 132,000 lines of text.
The model was developed as part of the University of Latvia-funded project ȬPEN: Open Knowledge Ecosystems for the Advancement of Citizen Science (ZDA-LIP 2025/2). Its development brings together several University of Latvia units – the Digital Humanities Centre of the Faculty of Humanities, the University Library, and the Archives of Latvian Folklore of the Institute of Literature, Folklore and Art – as well as Transkribus.
“This model is an important step towards expanding access to Latvia’s handwritten heritage. It not only accelerates manuscript transcription, but also opens up new opportunities for research, the creation of digital collections, and public participation in exploring cultural heritage. It is especially important that the model is relatively open. All registered Transkribus users can use it in their own projects and continue improving it,” says associate professor Sanita Reinsone, project leader and head of the Digital Humanities Centre of the Faculty of Humanities, University of Latvia.
Work on the project continues with the development of a text recognition model for 20th-century Latvian handwriting, which will further expand the possibilities for automated recognition of Latvian handwritten materials.
The model is available on the Transkribus platform under the name “Latvian 19th century”: