|
Three attempts were made at tagging the texts. The first experiments were performed in 2005. The texts were at first tagged with a method that was developed for Modern Icelandic. Methods of tagging Icelandic text have been developed using tagged texts of the Frequency Dictionary. The data-driven tagger TnT (Brants 2000) was trained on the tagged texts of the Icelandic Frequency Dictionary (Sigrún Helgadóttir 2004, 2007). A model was created, that can be used to tag new texts and all the texts in the Saga Corpus were tagged using this model. In order to measure the tagging accuracy four randomly selected samples of 1000 words each were used; one from the Family Sagas, one from Heimskringla and two from the Sturlunga Saga. The tags in these samples were corrected manually. When the correct tags in these examples were counted the tagging accuracy was 88% whereas it was 90.4% in the texts from the Icelandic Frequency Dictionary. The structure of sentences in Old Icelandic is quite different from that in Modern Icelandic. Different word order should particularly affect the accuracy of a statistical tagger such as TnT, which is based on trigrams. However, sentences in Old Icelandic texts are generally very short and it is easier to analyze short sentences than long ones.
|