ATN
ATN

Reputation: 665

Parse model ignored

I'm trying to get the Stanford parser to work for my pipeline for German text, but it refuses to take the German parser:

Properties props = new Properties();

props.put("annotators", "tokenize, ssplit, pos, parse");
props.put("ssplit.isOneSentence", "true");
props.put("pos.model", "pos-taggers/german-fast/german-fast.tagger");
props.put("pos.maxlen", "30");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz");
props.put("encoding", "utf-8");

pipeline = new StanfordCoreNLP(props);

I still get the following output and nothing more because German tags are not recognized:

Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
Initializing lexicon scores ... The 15 open class tags are: [ TRUNC NE NN XY VVIZU ADV VVINF VVFIN VVPP CARD NN-OA ADJA FM ADJD NN-SB ] 

The failure trace:

java.lang.IllegalArgumentException: Unknown option: -retainTmpSubcategories
at edu.stanford.nlp.parser.lexparser.Options.setOption(Options.java:175)
at edu.stanford.nlp.parser.lexparser.Options.setOptions(Options.java:68)
at edu.stanford.nlp.parser.lexparser.Options.setOptions(Options.java:49)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.setOptionFlags(LexicalizedParser.java:841)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:159)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:143)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:176)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:106)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:734)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:261)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at da.utils.nlp.SentimentExtractor.initPipeline(SentimentExtractor.java:111)
at da.utils.nlp.SentimentExtractor.coreAnnotate(SentimentExtractor.java:117)
at da.utils.nlp.SentimentExtractorTest.testCoreAnnotate(SentimentExtractorTest.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

Any idea what may be wrong in my implementation?

I checked the file location with no success.

Upvotes: 3

Views: 794

Answers (1)

Christopher Manning
Christopher Manning

Reputation: 9450

The simple (if confusing) answer should be that you just need to add this line in your Properties setup:

props.put("parse.flags", "");

(This should be fixed, but the flags default to an option that is useful when getting out English dependencies, but not relevant or available in other languages, hence your getting the error message above.)

HOWEVER, if this were the only problem, you should first see it loading the German parser before giving the long error dump like this:

Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/germanFactored.ser.gz ... done [5.2 sec].
Exception in thread "main" java.lang.IllegalArgumentException: Unknown option: -retainTmpSubcategories

But in the output you show, it is still loading an English parser. So something else must be wrong. I'm not sure about this part, but two possibilities are:

  • You're running an old version of Stanford CoreNLP. A while back, the options were called "parser.model", "parser.flags", etc., but we renamed them for consistency.
  • You don't have a resource called edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz on your CLASSPATH

Upvotes: 2

Related Questions