user1170883
user1170883

Reputation: 93

Stanford NER for phrases or compound entities

I noticed that corenlp.run can identify "10am tomorrow" and parse it out as time. But the training tutorial and the docs I've seen only allow for 1 word per line. How do I get it to understand a phrase. On a related note, is there a way to tag compound entities?

Upvotes: 1

Views: 515

Answers (1)

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

Time related phrases like that are recognized by the SUTime library. More details can be found here: https://nlp.stanford.edu/software/sutime.html

There is functionality for extracting entities after the ner tagging has been done.

For instance if you have tagged a sentence: Joe Smith went to Hawaii . as PERSON PERSON O O LOCATION O you can extract out Joe Smith and Hawaii. This requires the entitymentions annotator.

Here is some example code:

package edu.stanford.nlp.examples;

import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;

import java.util.*;

public class EntityMentionsExample {

  public static void main(String[] args) {
    Annotation document =
        new Annotation("John Smith visited Los Angeles on Tuesday.");
    Properties props = new Properties();
    //props.setProperty("regexner.mapping", "small-names.rules");
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);

    for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
      System.out.println(entityMention);
      //System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class));
      System.out.println(entityMention.get(CoreAnnotations.EntityTypeAnnotation.class));
    }
  }
}

Upvotes: 2

Related Questions