Reputation: 93
I noticed that corenlp.run can identify "10am tomorrow" and parse it out as time. But the training tutorial and the docs I've seen only allow for 1 word per line. How do I get it to understand a phrase. On a related note, is there a way to tag compound entities?
Upvotes: 1
Views: 515
Reputation: 8739
Time related phrases like that are recognized by the SUTime library. More details can be found here: https://nlp.stanford.edu/software/sutime.html
There is functionality for extracting entities after the ner
tagging has been done.
For instance if you have tagged a sentence: Joe Smith went to Hawaii .
as PERSON PERSON O O LOCATION O
you can extract out Joe Smith
and Hawaii
. This requires the entitymentions
annotator.
Here is some example code:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import java.util.*;
public class EntityMentionsExample {
public static void main(String[] args) {
Annotation document =
new Annotation("John Smith visited Los Angeles on Tuesday.");
Properties props = new Properties();
//props.setProperty("regexner.mapping", "small-names.rules");
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
System.out.println(entityMention);
//System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class));
System.out.println(entityMention.get(CoreAnnotations.EntityTypeAnnotation.class));
}
}
}
Upvotes: 2