Reputation: 56
Edit : As asked, here is the original image
Dear community
I am trying to do some ocr.
I have already pre-processed the image a lot (unskew, crop...)
Now, I can read the digits myself with no problem
But I can't get tesseract giving me a meaningfull result.
Click on the link at the top to see the image I am trying to OCR
Is there more pre-processing I am missing ?
Or do I call tesseract badly ?
I tried with no option at all, or with that :
config = ('--psm 13 -c tessedit_char_whitelist=0123456789')
Edit :
Funny thing, I tried multiple ways :
So It's the very beggining for me. Imay prefere to use Tesseract so as not to pay big bucks. Will se what I can do when my project is more advanced.
But I am eager to hear your suggestions about image preprocessing !! :-)
So if you have suggestion.
Regards !
Upvotes: 3
Views: 18794
Reputation: 309
You can give three important flags for tesseract to work and these are -l
, --oem
, and --psm
.
The -l
flag controls the language of the input text.
The --oem
argument, or OCR Engine Mode, controls the type of algorithm used by Tesseract.
The --psm
controls the automatic Page Segmentation Mode used by Tesseract.
to get options use:
tesseract --help-oem
for oem.
tesseract --help-psm
for psm.
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files for language codes:
use these options like this config = ("-l eng --oem 1 --psm 7")
Upvotes: 11