Diego Gonzalez
Diego Gonzalez

Reputation: 53

Can I configure Tesseract to detect only single letters and digits?

I'm trying Tesseract ocr to process specific cards with a matrix like this: matrix of numbers

Is there any way to configure Tesseract to extract only single letters?

The problem is the columns of the matrix have letters as titles: "A B C D E F G H I" when I train using the BOX file each letter is detected, but when I execute the ocr process the letters are merged into a word: "ABCDEFGHI". I need the words separated because I need the bounds of each column (x, y, height, width) and it will make the processing of the entire column more accurate.

Thanks,

Upvotes: 1

Views: 1312

Answers (1)

nguyenq
nguyenq

Reputation: 8345

If you could increase the interspacing large enough, Tesseract could pick up the spacing after setting variable preserve_interword_spaces=1 (see doc).

Upvotes: 1

Related Questions