I'm trying to extract information from natural language content using the Stanford CoreNLP library. My goal is to extract "subject-action-object" pairs (simplified) from sentences. As an example consider the following sentence: John Smith only eats an apple and a banana for lunch. He's on a diet and his mother told him that it would be very healthy to eat less for lunch. John doesn't like it at all but since he's very serious with his diet, he doesn't want to stop. From this sentence I would like to get results as followed: John Smith - eats - only an apple and a banana for lunch He - is - on a diet His mother - told - him - that it would be very healthy to eat less for lunch John - doesn't like - it (at all) He - is - very serious with his diet How would one do this? Or to be more specific: How can I parse a dependency tree (or a better-suited tree?) to obtain results as specified above? Any hint, resource or code snippet given this task would be highly appreciated. Side note: I managed to replace coreferences with their representative mention which would then change the he and his to the corresponding entity (John Smith in that case).

Reputation: 683

Relationship Extraction using Stanford CoreNLP

I'm trying to extract information from natural language content using the Stanford CoreNLP library.

My goal is to extract "subject-action-object" pairs (simplified) from sentences.

As an example consider the following sentence:

John Smith only eats an apple and a banana for lunch. He's on a diet and his mother told him that it would be very healthy to eat less for lunch. John doesn't like it at all but since he's very serious with his diet, he doesn't want to stop.

From this sentence I would like to get results as followed:

John Smith - eats - only an apple and a banana for lunch
He - is - on a diet
His mother - told - him - that it would be very healthy to eat less for lunch
John - doesn't like - it (at all)
He - is - very serious with his diet

How would one do this?

Or to be more specific: How can I parse a dependency tree (or a better-suited tree?) to obtain results as specified above?

Any hint, resource or code snippet given this task would be highly appreciated.

Side note: I managed to replace coreferences with their representative mention which would then change the he and his to the corresponding entity (John Smith in that case).

Upvotes: 5

Answers (2)

Gabor Angeli

Reputation: 5759

You could also try out the new Stanford OpenIE system: http://nlp.stanford.edu/software/openie.shtml. In addition to the standalone download, it's now bundled in CoreNLP 3.6.0+.

Upvotes: 2

StanfordNLPHelp

Reputation: 8739

The Stanford CoreNLP toolkit comes with a dependency parser.

First of all here is a link where the types of edges in tree are described:

http://universaldependencies.github.io/docs/

There are numerous ways you can use the toolkit to generate the dependency tree.

Here is some sample code to get you started:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;

public class DependencyTreeExample {

    public static void main (String[] args) throws IOException {

        // set up properties
        Properties props = new Properties();
        props.setProperty("ssplit.eolonly","true");
        props.setProperty("annotators",
                "tokenize, ssplit, pos, depparse");
        // set up pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // get contents from file
        String content = new Scanner(new File(args[0])).useDelimiter("\\Z").next();
        System.out.println(content);
        // read in a product review per line
        Annotation annotation = new Annotation(content);
        pipeline.annotate(annotation);

        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            System.out.println("---");
            System.out.println("sentence: "+sentence);
            SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
        }


    }

}

instructions:

Cut and paste this into DependencyTreeExample.java
put that file in the directory stanford-corenlp-full-2015-04-20
javac -cp "*:." DependencyTreeExample.java
add your sentences one sentence per line to a file called dependency_sentences.txt
java -cp "*:." DependencyTreeExample dependency_sentences.txt

an example of output:

sentence: John doesn't like it at all.
dep                 reln                gov                 
---                 ----                ---                 
like-4              root                root                
John-1              nsubj               like-4              
does-2              aux                 like-4              
n't-3               neg                 like-4              
it-5                dobj                like-4              
at-6                case                all-7               
all-7               nmod:at             like-4              
.-8                 punct               like-4

This will print out the dependency parses. By working with the SemanticGraph object you can write code to find the kinds of patterns you want.

You'll note in this example "like" points to "John" with "nsubj" and "like" points to "it" with "dobj"

For reference you should look at edu.stanford.nlp.semgraph.SemanticGraph

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/SemanticGraph.html

Upvotes: 5

Relationship Extraction using Stanford CoreNLP

Answers (2)

Related Questions