Reputation: 391
I am trying lemmatization with stanford corenlp following this question. My environment is:-
my code snippet is:-
//...........lemmatization starts........................
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences)
{
for(CoreLabel token: sentence.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
System.out.println("lemmatized version :" + lemma);
}
}
//...........lemmatization ends.........................
the output i get is:-
lemmatized version :painting
where i expect
lemmatized version :paint
Please enlighten me.
Upvotes: 0
Views: 497
Reputation: 1563
The problem in this example is that the word painting can be the present participle of to paint or a noun and the output of the lemmatizer depends on the part-of-speech tag assigned to the original word.
If you run the tagger only on the fragment painting, then there is no context that could help the tagger (or a human) to decide how the word should be tagged. In this case it picked the tag NN
and the lemma of the noun painting is in fact painting.
If you run the same code with the sentence "I am painting a flower." the tagger should correctly tag painting as VBG
and the lemmatizer should return paint.
Upvotes: 2