Antoine Driot
Antoine Driot

Reputation: 56

Tesseract options & image preprocessing

The image I try to OCR

Edit : As asked, here is the original image

Dear community I am trying to do some ocr.
I have already pre-processed the image a lot (unskew, crop...)
Now, I can read the digits myself with no problem
But I can't get tesseract giving me a meaningfull result.

Click on the link at the top to see the image I am trying to OCR

Is there more pre-processing I am missing ?
Or do I call tesseract badly ?

I tried with no option at all, or with that :

config = ('--psm 13 -c tessedit_char_whitelist=0123456789')

Edit :

Funny thing, I tried multiple ways :

So It's the very beggining for me. Imay prefere to use Tesseract so as not to pay big bucks. Will se what I can do when my project is more advanced.

But I am eager to hear your suggestions about image preprocessing !! :-)

So if you have suggestion.

Regards !

Upvotes: 3

Views: 18794

Answers (1)

Ramesh Kamath
Ramesh Kamath

Reputation: 309

You can give three important flags for tesseract to work and these are -l , --oem , and --psm.

  • The -l flag controls the language of the input text.

  • The --oem argument, or OCR Engine Mode, controls the type of algorithm used by Tesseract.

  • The --psm controls the automatic Page Segmentation Mode used by Tesseract.

to get options use:

use these options like this config = ("-l eng --oem 1 --psm 7")

Upvotes: 11

Related Questions