Reputation: 49
How do I calculate the accuracy of known and unknown words in part of speech tagging? For example for known words, is it dividing the correctly tagged known words by all the known words ? Any other ways ?
Upvotes: 0
Views: 212
Reputation: 470
I think you got right way. What you need is just a lexicon to determine whether a given word is a known word or unknown word. RDRPOSTagger provides a piece of code to compute tagging accuracies for known words and unknown words. See the function computeAccuracies(lexicon, goldCorpus, taggedCorpus)
in the Eval.py
module in the Utility
package.
You might want to look at this paper which presents tagging results (for known words and unknown words) of 3 POS and morphological taggers on 13 languages including Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese.
Upvotes: 1