Stanford NER CharacterOffsetBegin

Question

I am using Stanford CoreNLP for NER for a list of short documents.

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP
-annotators tokenize,ssplit,pos,lemma,ner -ssplit.eolonly -pos.model edu/stanford/nlp/models/pos-tagger/english-caseless-left3words-distsim.tagger
-ner.model edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz
-file .../input -outputDirectory .../stanford_ner

The problem is the CharacterOffsetBegin and CharacterOffsetEnd I get from each token are continuous number from the previous documents. Therefore for example the very first token of document_2 has a CharacterOffsetBegin of 240 rather than 0. Is there any option I can use in the command line? Any help would be greatly appreciated, thanks!

Stanford NER CharacterOffsetBegin

Answers (1)

Related Questions