Totem
Totem

Reputation: 474

Sentence detection in russian

I am using Apache OpenNLP library. I am working on a project that needs several NLP tasks performed in different languages and among those Russian is a very important one. However I do not know russian and cannot find any OpenNLP models for russian.

So the only way I can reliably perform sentence detection is to train a sentence detector on a Russian text and produce a model that I will use later. The text I have to analyze is very specific and is not general enough to create a valid model.

Therefore I am asking if anyone can provide me a russian reference text divided in sentences that is general enough (contains common idioms, abbreviations, etc...). I don't know how long it should be since the documentation doesn't specify a suggest size for training texts. However, I think that maybe a few hundred sentences would be enough.

Upvotes: 2

Views: 1324

Answers (1)

Totem
Totem

Reputation: 474

In the end I took the document suggested in the first comment, plus some articles on wikipedia and achieved 98% precisiion, so it's fine :3

Upvotes: 1

Related Questions