Raman
Raman

Reputation: 19675

Stanford CorenNLP Phrase POS tags and lemmatization explanation

I have this result during lemmatization of the phrase:

Gathered requirements

Using the corenlp online tool, POS-tagging and lemmatization of this phrase results in:

enter image description here

For some reason "Gathered" is given a POS-tag of "JJ" ("adjective"), which presumably results in the lemma being "gathered" rather than "gather".

If the input phrase is gathered requirements (i.e. lower-cased), then the POS tag is correctly identified as a verb, and the lemmatization result is what I expected:

enter image description here

Why is CoreNLP identifying Gathered as an adjective rather than a verb?

Upvotes: 2

Views: 400

Answers (1)

Alikbar
Alikbar

Reputation: 697

The system checks the probability of POS tag that most happened for the word "Gathered" when you write "Gathered requirments". Only some kind of words such as named-entities, start of sentence, etc. start with a capital letter. The general reason that "Gathered" is more likely to be JJ when you start it with a capital letter is that it was mostly used as adjective not verb in start of sentence.

Upvotes: 2

Related Questions