Reputation: 73
Is there a way to call stanford parser from command line so that it parses one sentence at a time, and in case of troubles at a specific sentence just goes over to the next sentence?
UPDATE:
I have been adapting the script posted StanfordNLP Help. However, I noticed that, with the last version of corenlp (2015-04-20) there are problems with the CCprocessed dependencies: collapsing just appears not to take place (if I grep prep_ on the output, I find nothing). Collapsing works with the 2015-04-20 and PCFG, for example, so I assume the issue is model-specific.
If I use the very same java class in corenlp 2015-01-29 (with depparse.model changed to parse.model, and removing the original dependencies part), collapsing works just fine. Maybe I am just using the parser in the wrong way, that's why I am re-posting here and not starting a new post. Here is the updated code of the class:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class StanfordSafeLineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos, lemma, depparse");
props.setProperty("ssplit.eolonly","true");
props.setProperty("tokenize.whitespace","false");
props.setProperty("depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz");
props.setProperty("parse.originalDependencies", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// open file
BufferedReader br = new BufferedReader(new FileReader(args[0]));
// go through each sentence
for (String line = br.readLine() ; line != null ; line = br.readLine()) {
try {
Annotation annotation = new Annotation(line);
pipeline.annotate(annotation);
ArrayList<String> edges = new ArrayList<String>();
CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);
System.out.println("sentence: "+line);
for (CoreLabel token: annotation.get(CoreAnnotations.TokensAnnotation.class)) {
Integer identifier = token.get(CoreAnnotations.IndexAnnotation.class);
String word = token.get(CoreAnnotations.TextAnnotation.class);
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);
System.out.println(identifier+"\t"+word+"\t"+pos+"\t"+lemma);
}
SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
SemanticGraph tree2 = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println("---BASIC");
System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
System.out.println("---CCPROCESSED---");
System.out.println(tree2.toString(SemanticGraph.OutputFormat.READABLE)+"</s>");
} catch (Exception e) {
System.out.println("Error with this sentence: "+line);
System.out.println("");
}
}
}
}
Upvotes: 1
Views: 1263
Reputation: 8739
Here is some sample code for your need:
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class StanfordSafeLineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos, depparse");
props.setProperty("ssplit.eolonly","true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// open file
BufferedReader br = new BufferedReader(new FileReader(args[0]));
// go through each sentence
for (String line = br.readLine() ; line != null ; line = br.readLine()) {
try {
Annotation annotation = new Annotation(line);
pipeline.annotate(annotation);
ArrayList<String> edges = new ArrayList<String>();
CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);
SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
System.out.println("---");
System.out.println("sentence: "+line);
System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
} catch (Exception e) {
System.out.println("---");
System.out.println("Error with this sentence: "+line);
}
}
}
}
instructions:
Upvotes: 0
Reputation: 8739
There are many ways to handle this.
The way I'd do it is to run the Stanford CoreNLP pipeline.
Here is where you can get the appropriate jar:
http://nlp.stanford.edu/software/corenlp.shtml
After you cd into the directory stanford-corenlp-full-2015-04-20
you can issue this command:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,parse -ssplit.eolonly -outputFormat text -file sample_sentences.txt
sample_sentences.txt would have the sentences you want to parse, one sentence per line
This will put the results in sample_sentences.txt.out which you can extract with some light scripting.
If you change -outputFormat to json instead of text, you will get some json which you can easily load and get the parses from
If you have any issues with this approach let me know and I can modify the answer to further assist you/clarify!
UPDATE:
I am not sure the exact way you are running things, but these options could be helpful.
If you use -fileList to run the pipeline on a list of files rather than on a single file, and then use this flag: -continueOnAnnotateError it should just skip the bad file, which is progress, though admittedly not just skipping the bad sentence
I wrote some Java for doing exactly what you need, so I'll try to post that in the next 24 hours if you just want to use my whipped together Java code, I'm still looking it over...
Upvotes: 2