I am collaborating with Quinn Dombrowski on a project to create XML resources to facilitate the linguistic and historical study of the Novgorod birchbark letters. In particular, I am working on developing a Unicode-compliant version of Zaliznjak 2004′s vocabulary and a markup of the vocabulary matching the orthography of vowels to their Common Slavic etymology. A sample of the Unicode-compliant vocabulary may be found here, while a very small sample of the orthography markup can be found in the handout of a talk I gave in April 2010. We are also developing resources to track the people mentioned in the birchbark letters. More details about this aspect of the project, as well as technical specifications, can be found on Quinn’s site.
The birchbark letters (BBLs) have attracted the attention of researchers ever since their discovery in 1951. From the point of view of historical linguistics, they have been invaluable insofar as they have enabled the description and analysis of an Early East Slavic dialect markedly different from the varieties attested in other sources. The XML resources that we are developing aim to provide tools to tackle some issues that have been persistent in the scholarly analysis of the BBLs. In particular, we hope to address the following questions:
- What information can orthography provide about phonology? It is well known that the BBLs differ sharply from other Early East Slavic documents both orthographically and linguistically. The relationship between orthography and phonology is by no means straightforward. Markup of the orthography-etymology relationship will allow a more precise and quantitative analysis of orthographic patterns, thereby enabling a more nuanced approach to this question.
- How do orthographic / phonological features vary with regard to other variables (time, place, each other, etc.)? The sheer amount of data to be processed has meant that attempts to answer this type of question have remained rather general. Multiple markups will enable the comparison of many different features to identify new correlations.
- What about the people behind the letters? The BBLs are a large enough corpus to give a somewhat representative picture of medieval Novgorodian society. Through marking up as much information as possible about the people referenced in the letters, we hope to maximize the application of this data for various projects.
