Aryeh Tuchfeld
Aryeh Tuchfeld

Reputation: 9

How can I teach the NLP Splitter

Please give me directions: How can I "teach" the splitter to split such paragraph: The paper is 7 cm. length. What is the painter name? the size of the picture is 5 cm. x 8 cm. into 3 parts. and not to 5 parts as done by default: 1) The paper is 7 cm. 2) length. 3) What is the painter name? 4) the size of the picture is 5 cm. 5) x 8 cm. Thanks, Aryeh.

Upvotes: 0

Views: 99

Answers (1)

Sebastian Schuster
Sebastian Schuster

Reputation: 1563

The tokenizer is entirely rule-based so you can add custom abbreviations to it. You will have to edit PTBLexer.flex and recompile it using JFlex.

See also "stanford corenlp, splitting sentences, abbreviation exceptions".

Upvotes: 1

Related Questions