Gabriella Lapesa
Gabriella Lapesa

Reputation: 73

Stanford Parser: sentence by sentence on command line

Is there a way to call stanford parser from command line so that it parses one sentence at a time, and in case of troubles at a specific sentence just goes over to the next sentence?

UPDATE:

I have been adapting the script posted StanfordNLP Help. However, I noticed that, with the last version of corenlp (2015-04-20) there are problems with the CCprocessed dependencies: collapsing just appears not to take place (if I grep prep_ on the output, I find nothing). Collapsing works with the 2015-04-20 and PCFG, for example, so I assume the issue is model-specific.

If I use the very same java class in corenlp 2015-01-29 (with depparse.model changed to parse.model, and removing the original dependencies part), collapsing works just fine. Maybe I am just using the parser in the wrong way, that's why I am re-posting here and not starting a new post. Here is the updated code of the class:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;


public class StanfordSafeLineExample {

public static void main (String[] args) throws IOException {
    // build pipeline                                                                                                                                                                                    
    Properties props = new Properties();
    props.setProperty("annotators","tokenize, ssplit, pos, lemma, depparse");
    props.setProperty("ssplit.eolonly","true");
    props.setProperty("tokenize.whitespace","false");
    props.setProperty("depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz");
    props.setProperty("parse.originalDependencies", "true");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // open file                                                                                                                                                                                         
    BufferedReader br = new BufferedReader(new FileReader(args[0]));
    // go through each sentence                                                                                                                                                                          
    for (String line = br.readLine() ; line != null ; line = br.readLine()) {
        try {
            Annotation annotation = new Annotation(line);
            pipeline.annotate(annotation);
            ArrayList<String> edges = new ArrayList<String>();
            CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);

            System.out.println("sentence: "+line);
            for (CoreLabel token: annotation.get(CoreAnnotations.TokensAnnotation.class)) {

                    Integer identifier = token.get(CoreAnnotations.IndexAnnotation.class);
                    String word = token.get(CoreAnnotations.TextAnnotation.class);
                    String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
                    String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);
                    System.out.println(identifier+"\t"+word+"\t"+pos+"\t"+lemma);
            }

            SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
            SemanticGraph tree2 = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println("---BASIC");
            System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
            System.out.println("---CCPROCESSED---");
            System.out.println(tree2.toString(SemanticGraph.OutputFormat.READABLE)+"</s>");
        } catch (Exception e) {

            System.out.println("Error with this sentence: "+line);
            System.out.println("");
        }
    }

}

}

Upvotes: 1

Views: 1263

Answers (2)

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

Here is some sample code for your need:

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*; 
import edu.stanford.nlp.util.*;


public class StanfordSafeLineExample {

        public static void main (String[] args) throws IOException {
            // build pipeline
            Properties props = new Properties();
            props.setProperty("annotators","tokenize, ssplit, pos, depparse");
            props.setProperty("ssplit.eolonly","true");
            StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
            // open file
            BufferedReader br = new BufferedReader(new FileReader(args[0]));
            // go through each sentence
            for (String line = br.readLine() ; line != null ; line = br.readLine()) {
                try {
                    Annotation annotation = new Annotation(line);
                    pipeline.annotate(annotation);
                    ArrayList<String> edges = new ArrayList<String>();
                    CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);
                    SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
                    System.out.println("---");
                    System.out.println("sentence: "+line);
                    System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
                } catch (Exception e) {
                    System.out.println("---");
                    System.out.println("Error with this sentence: "+line);
                }
            }

        }
}

instructions:

  • Cut and paste this into StanfordSafeLineExample.java
  • put that file in the directory stanford-corenlp-full-2015-04-20
  • javac -cp "*:." StanfordSafeLineExample.java
  • add your sentences one sentence per line to a file called sample_sentences.txt
  • java -cp "*:." StanfordSafeLineExample sample_sentences.txt

Upvotes: 0

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

There are many ways to handle this.

The way I'd do it is to run the Stanford CoreNLP pipeline.

Here is where you can get the appropriate jar:

http://nlp.stanford.edu/software/corenlp.shtml

After you cd into the directory stanford-corenlp-full-2015-04-20

you can issue this command:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -outputFormat text -file sample_sentences.txt

sample_sentences.txt would have the sentences you want to parse, one sentence per line

This will put the results in sample_sentences.txt.out which you can extract with some light scripting.

If you change -outputFormat to json instead of text, you will get some json which you can easily load and get the parses from

If you have any issues with this approach let me know and I can modify the answer to further assist you/clarify!

UPDATE:

I am not sure the exact way you are running things, but these options could be helpful.

If you use -fileList to run the pipeline on a list of files rather than on a single file, and then use this flag: -continueOnAnnotateError it should just skip the bad file, which is progress, though admittedly not just skipping the bad sentence

I wrote some Java for doing exactly what you need, so I'll try to post that in the next 24 hours if you just want to use my whipped together Java code, I'm still looking it over...

Upvotes: 2

Related Questions