Reputation: 446
Problem
After running CoreNLP over some text, I want to reconstruct a sentence adding the POS-tag for each Token and grouping the tokens that form an entity.
This could be easily done if there was a way to see which entity a Token belongs to.
Aproach
One option I was considering now was going through sentence.tokens()
and finding the index in a list containing only the Tokens from all the CoreEntityMentions for that sentence. Then I could see which CoreEntityMention that Token belongs to, so I can group them.
Another option could be to look the offsets of each Token in the sentence and compare it to the offset of each CoreEntityMention.
I think the question is similar to what was asked here, but since it was a while ago, maybe the API has changed since.
This is the setup:
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
pipeline = new StanfordCoreNLP(props);
String text = "Some text with entities goes here";
CoreDocument coreDoc = new CoreDocument(text);
// annotate the document
pipeline.annotate(coreDoc);
for (CoreSentence sentence : coreDoc.sentences()) {
// Code goes here
List<CoreEntityMention> em : sentence.entityMentions();
}
Upvotes: 1
Views: 173
Reputation: 8739
Each token in an entity mention contains an index to which entity mention in the document it corresponds to.
cl.get(CoreAnnotations.EntityMentionIndexAnnotation.class);
I'll make a note to add a convenience method for this future versions.
Upvotes: 1