Gabriella Lapesa
Gabriella Lapesa

Reputation: 73

Stanford CoreNLP - dashes

I am experiencing a problem with the use of the Stanford pipeline (last version of CoreNLP) to parse the BNC.

The problematic sentence excerpt is the following, and the problem are the dashes (if I remove them, it goes through).

"... they did it again and again — on and off for years."

The parser just gets stuck in this sentence, and it does not even throw an error.The sentence gets parsed correctly in the web interface.

I tried with the options of the tokenizer, with no result.

I add the command line I am using: java [...] edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,depparse -tokenize.whitespace false -ssplit.eolonly true -parse.model edu/stanford/nlp/models/parser/nndep/english_SD.gz -file $inputfile

Does anybody have a suggestion on how to cope with this issue?

Thanks a lot in advance!

Gabriella

Upvotes: 0

Views: 405

Answers (1)

Christopher Manning
Christopher Manning

Reputation: 9450

Running with Stanford CoreNLP v.3.5.2 on OS X 10.10.4, I couldn't reproduce this problem. The example string given was parsed just fine.

There could be a problem, but if so it is subtle and you'd want to similarly give more information on Stanford NLP version, OS and version, and to stick a textfile that doesn't work somewhere to be downloaded, to make sure the problem isn't something like line endings that gets lost when pasting text on a web page.

Upvotes: 1

Related Questions