Iterate through tokens and find the entity for a token

Question

Problem

After running CoreNLP over some text, I want to reconstruct a sentence adding the POS-tag for each Token and grouping the tokens that form an entity.

This could be easily done if there was a way to see which entity a Token belongs to.

Aproach

One option I was considering now was going through sentence.tokens() and finding the index in a list containing only the Tokens from all the CoreEntityMentions for that sentence. Then I could see which CoreEntityMention that Token belongs to, so I can group them.

Another option could be to look the offsets of each Token in the sentence and compare it to the offset of each CoreEntityMention.

I think the question is similar to what was asked here, but since it was a while ago, maybe the API has changed since.

This is the setup:

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");

    pipeline = new StanfordCoreNLP(props);
    String text = "Some text with entities goes here";
    CoreDocument coreDoc = new CoreDocument(text);
    // annotate the document
    pipeline.annotate(coreDoc);
    for (CoreSentence sentence : coreDoc.sentences()) {
      // Code goes here
      List em : sentence.entityMentions();
    }

StanfordNLPHelp · Accepted Answer

Each token in an entity mention contains an index to which entity mention in the document it corresponds to.

cl.get(CoreAnnotations.EntityMentionIndexAnnotation.class);

I'll make a note to add a convenience method for this future versions.

Iterate through tokens and find the entity for a token

Answers (1)

Related Questions