Reputation: 2295

What kind of data to use for training new language for Tesseract OCR

I want to findout what kind of data we will use to train the new language for Tesseract OCR?

Is it each character? Or we have to make some specific sentences?

Please help to give some source of this information, I can't get clearly on its wiki page.

Upvotes: 0

Answers (1)

Reputation: 309

Try this page. It tells you the steps they took to get it to recognize ancient greek http://www.eutypon.gr/eutypon/pdf/e2012-29/e29-a01.pdf

This is general information from the tesseract team about training tesseract https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Upvotes: 1