Piyush
Piyush

Reputation: 33

Different results performing Part of Speech tagging using Core NLP and Stanford Parser?

The Part Of Speech (POS) models that Stanford parser and Stanford CoreNlp uses are different, that's why there is difference in the output of the POS tagging performed through Stanford Parser and CoreNlp.

Is there documentation comparing two models and other detail explanation for the differences ?

It seems output of corenlp is wrong for these cases. Apart from few sentences which I checked during error analysis I guess there would be quite a lot of similar cases where these kind of errors might be.

Upvotes: 3

Views: 631

Answers (1)

Christopher Manning
Christopher Manning

Reputation: 9450

This isn't really about CoreNLP, it's about whether you are using the Stanford POS tagger or the Stanford Parser (the PCFG parser) to do the POS tagging. (The PCFG parser usually does POS tagging as part of its parsing algorithm, although it can also use POS tags given from elsewhere.) Both sometimes make mistakes. On average, the POS tagger is a slightly better POS tagger than the parser. But, sometimes the parser wins, and in particular, it sometimes seems like it is better at tagging decisions that involve integrating clause-level information. At any rate, it gets these two examples right - though I bet you could also find some examples that go the other way.

If you want to use the PCFG parser for POS tagging in CoreNLP, simply omit the POS tagger, and move the parser earlier so that POS tags are available for the lemmatizer and regex-based NER:

java -mx3g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,parse,lemma,ner,dcoref -file test.txt

However, some of our other parsers (NN dependency parser, SR constituency parser) require POS tagging to have been done first.

Upvotes: 5

Related Questions