Reputation: 73
I am experiencing a problem with the use of the Stanford pipeline (last version of CoreNLP) to parse the BNC.
The problematic sentence excerpt is the following, and the problem are the dashes (if I remove them, it goes through).
"... they did it again and again — on and off for years."
The parser just gets stuck in this sentence, and it does not even throw an error.The sentence gets parsed correctly in the web interface.
I tried with the options of the tokenizer, with no result.
I add the command line I am using: java [...] edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,depparse -tokenize.whitespace false -ssplit.eolonly true -parse.model edu/stanford/nlp/models/parser/nndep/english_SD.gz -file $inputfile
Does anybody have a suggestion on how to cope with this issue?
Thanks a lot in advance!
Gabriella
Upvotes: 0
Views: 405
Reputation: 9450
Running with Stanford CoreNLP v.3.5.2 on OS X 10.10.4, I couldn't reproduce this problem. The example string given was parsed just fine.
There could be a problem, but if so it is subtle and you'd want to similarly give more information on Stanford NLP version, OS and version, and to stick a textfile that doesn't work somewhere to be downloaded, to make sure the problem isn't something like line endings that gets lost when pasting text on a web page.
Upvotes: 1