math
math

Reputation: 49

POS accuracy of known and unknown words

How do I calculate the accuracy of known and unknown words in part of speech tagging? For example for known words, is it dividing the correctly tagged known words by all the known words ? Any other ways ?

Upvotes: 0

Views: 212

Answers (1)

NQD
NQD

Reputation: 470

I think you got right way. What you need is just a lexicon to determine whether a given word is a known word or unknown word. RDRPOSTagger provides a piece of code to compute tagging accuracies for known words and unknown words. See the function computeAccuracies(lexicon, goldCorpus, taggedCorpus) in the Eval.py module in the Utility package.

You might want to look at this paper which presents tagging results (for known words and unknown words) of 3 POS and morphological taggers on 13 languages including Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese.

Upvotes: 1

Related Questions