Tesseract - training

Question

I am trying to learn something the tesseract.

I am using jTessBoxEditor and Serak.

First I create some .txt which have for example 10 000 characters and they are separated with one space. I use this as input for jTessBoxEditor in TIFF/BOX generator. This creates for me boxes and .tiff image.

Now I verify the boxes and I see that they are correct. So I use it in Serak and traing tesseract and I create some xxx.traineddata.

Now I want to verify the results. So I create small .txt for example with 100 characters separated by space, but all are very similarly (file contains something like 5 S 5 S 0 O 2 Z and so on.). Now I create .tiff with same approach like in learning, so I use jTessBoxEditor, same font and I generate new .tiff file. Than in Serak I try to OCR this new .tiff and result is that 0 is mixed with O, 5 with S and so on.

What am I doing wrong?

Tesseract - training

Answers (1)

Related Questions