Evan
Evan

Reputation: 1713

OCR reading phone numbers with Tesseract

I am trying to complete a project that has to include some OCR. For the job I picked Tesseract OCR but the results are not optimal. I have tried to limit the character set to 1234567890- but the results are not good. Is there an optimal image size I can use or some way to train Tesseract to recognise this kind of string better?

The image is this: Phone

And the result tesseract returns is 05175150152 which is not right, and it should be better since the image is not modified in any way. I use tesseract through PHP with exec with the following command:

"C:\Program Files\Tesseract-OCR\tesseract.exe" C:\wamp\www\a
dwords\phones\center_ctl09_ctl04.png sssd -l eng -psm 7 nobatch letters

Any ideas on what i am doing wrong?

Upvotes: 1

Views: 2014

Answers (1)

nguyenq
nguyenq

Reputation: 8355

The image resolution of 96 DPI is tough for any OCR engine. Try to rescale it to 300 DPI and you will have better results.

Additionally, JPEG is a lossy image format. Use a different one, like TIFF or PNG, if possible.

Upvotes: 3

Related Questions