Stanford CoreNLP - dashes

Question

I am experiencing a problem with the use of the Stanford pipeline (last version of CoreNLP) to parse the BNC.

The problematic sentence excerpt is the following, and the problem are the dashes (if I remove them, it goes through).

"... they did it again and again — on and off for years."

The parser just gets stuck in this sentence, and it does not even throw an error.The sentence gets parsed correctly in the web interface.

I tried with the options of the tokenizer, with no result.

I add the command line I am using: java [...] edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,depparse -tokenize.whitespace false -ssplit.eolonly true -parse.model edu/stanford/nlp/models/parser/nndep/english_SD.gz -file $inputfile

Does anybody have a suggestion on how to cope with this issue?

Thanks a lot in advance!

Gabriella

Stanford CoreNLP - dashes

Answers (1)

Related Questions