HHH
HHH

Reputation: 6485

Part of speech tagging in OpenNLP vs. StanfordNLP

I'm new to part of speech (pos) taging and I'm doing a pos tagging on a text document. I'm considering using either OpenNLP or StanfordNLP for this. For StanfordNLP I'm using a MaxentTagger and I use english-left3words-distsim.tagger to train it. In OpenNLP I'm using POSModel and train it using en-pos-maxent.bin. How these two taggers (MaxentTagger and POSTagger) and the training sets (english-left3words-distsim.tagger and en-pos-maxent.bin) are different and which one is usually giving a better result.

Upvotes: 0

Views: 596

Answers (1)

schrieveslaach
schrieveslaach

Reputation: 1819

Both POS taggers are based on Maximum Entropy machine learning. They differ in the parameters/features used to determine POS tags. For example, StanfordNLP pos tagger uses: "(i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs" (read more in the paper). Features of OpenNLP are documented somewhere else which I currently don't know.

The models are probably trained on different corpora.

In general, it is really hard to tell which NLP tool performs better in term of quality. This is really dependent on your domain and you need to test your tools. See following papers for more information:

In order to address this problem practically, I'm developing a Maven plugin and an annotation tool to create domain-specific NLP models more effectively.

Upvotes: 1

Related Questions