Reputation: 683
I'm trying to extract information from natural language content using the Stanford CoreNLP library.
My goal is to extract "subject-action-object" pairs (simplified) from sentences.
As an example consider the following sentence:
John Smith only eats an apple and a banana for lunch. He's on a diet and his mother told him that it would be very healthy to eat less for lunch. John doesn't like it at all but since he's very serious with his diet, he doesn't want to stop.
From this sentence I would like to get results as followed:
How would one do this?
Or to be more specific: How can I parse a dependency tree (or a better-suited tree?) to obtain results as specified above?
Any hint, resource or code snippet given this task would be highly appreciated.
Side note:
I managed to replace coreferences with their representative mention which would then change the he
and his
to the corresponding entity (John Smith in that case).
Upvotes: 5
Views: 5127
Reputation: 5759
You could also try out the new Stanford OpenIE system: http://nlp.stanford.edu/software/openie.shtml. In addition to the standalone download, it's now bundled in CoreNLP 3.6.0+.
Upvotes: 2
Reputation: 8739
The Stanford CoreNLP toolkit comes with a dependency parser.
First of all here is a link where the types of edges in tree are described:
http://universaldependencies.github.io/docs/
There are numerous ways you can use the toolkit to generate the dependency tree.
Here is some sample code to get you started:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
public class DependencyTreeExample {
public static void main (String[] args) throws IOException {
// set up properties
Properties props = new Properties();
props.setProperty("ssplit.eolonly","true");
props.setProperty("annotators",
"tokenize, ssplit, pos, depparse");
// set up pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// get contents from file
String content = new Scanner(new File(args[0])).useDelimiter("\\Z").next();
System.out.println(content);
// read in a product review per line
Annotation annotation = new Annotation(content);
pipeline.annotate(annotation);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
System.out.println("---");
System.out.println("sentence: "+sentence);
SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
}
}
}
instructions:
an example of output:
sentence: John doesn't like it at all.
dep reln gov
--- ---- ---
like-4 root root
John-1 nsubj like-4
does-2 aux like-4
n't-3 neg like-4
it-5 dobj like-4
at-6 case all-7
all-7 nmod:at like-4
.-8 punct like-4
This will print out the dependency parses. By working with the SemanticGraph object you can write code to find the kinds of patterns you want.
You'll note in this example "like" points to "John" with "nsubj" and "like" points to "it" with "dobj"
For reference you should look at edu.stanford.nlp.semgraph.SemanticGraph
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/SemanticGraph.html
Upvotes: 5