tired and bored dev
tired and bored dev

Reputation: 693

Sentence Detection with OpenNLP

I'm trying out OpenNLP sentence detection tool. The text is in a file - para3.txt. Contents:

Bob went to London Mary came from Paris Now everything is fine.

I'm running this with following command:

opennlp SentenceDetector ../models/en-sent.bin < para3.txt

I get the output like this:

Bob went to London Mary came from Paris Now everything is fine.

Ideally, I would have seen three sentences as output:

Bob went to London.
Mary came from Paris.
Now everything is fine.

Now, if I try for other sentences, where "full stop" or "period" is present, sentence detection is happening fine. A human would have guessed that there are 3 sentences in the text, but how to get it done by OpenNLP? What tools of NLP could help here??? What is the next level of sentence detection?

Upvotes: 2

Views: 1400

Answers (2)

user4894151
user4894151

Reputation:

you should train your model to detect these type of sentences i.e., sentence detector training as given in the documentation. create your training file en-sent.train : Sample training data file. The only requirement is that each sentence should be on a separate line in the training file like below.

Sentence 1

Sentence 2

Sentence 3

……

……

then using command line interface:

opennlp SentenceDetectorTrainer -model en-sent_trained.bin -lang en -data en-sent.train -encoding UTF-8

this will give a model file : en-sent_trained.bin

now use this .bin file instead of en-sent.bin

hope this helps!

Upvotes: 2

Mohamed Gad-Elrab
Mohamed Gad-Elrab

Reputation: 656

This seems to be a malformed text actually. You can use chucking information to divide it to sentences using some heuristics.

Upvotes: 0

Related Questions