Reputation: 1713
I am trying to complete a project that has to include some OCR. For the job I picked Tesseract OCR but the results are not optimal. I have tried to limit the character set to 1234567890-
but the results are not good. Is there an optimal image size I can use or some way to train Tesseract to recognise this kind of string better?
The image is this:
And the result tesseract returns is 05175150152 which is not right, and it should be better since the image is not modified in any way. I use tesseract through PHP with exec with the following command:
"C:\Program Files\Tesseract-OCR\tesseract.exe" C:\wamp\www\a
dwords\phones\center_ctl09_ctl04.png sssd -l eng -psm 7 nobatch letters
Any ideas on what i am doing wrong?
Upvotes: 1
Views: 2014
Reputation: 8355
The image resolution of 96 DPI is tough for any OCR engine. Try to rescale it to 300 DPI and you will have better results.
Additionally, JPEG is a lossy image format. Use a different one, like TIFF or PNG, if possible.
Upvotes: 3