POS tagger is incorporated in most natural language processing tools like machine translation, information extraction as a basic component. On the other hand, many other languages do not have POS taggers like Shekki’noono language. Some of the researchers developed parts of speech taggers for different languages such as English Amharic, Afan Oromo, Tigrigna, etc. This study aims at presenting a Part of speech tagger that can assign word class to words in a given paragraph sentence. It enables people to talk with the computer in their formal language rather than machine language. Natural language processing plays a great role in providing an interface for human-computer communication. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively. The performance of the prototype, Afaan Oromo tagger is tested using tenfold cross validation mechanism. These two probabilities are from which the tagger learn and tag sequence of words in sentences. A database of lexical probabilities and transitional probabilities are developed from the annotated corpus. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. After reviewing literatures on Afaan Oromo grammars and identifying tagset and word categories, the study adopted Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |