user3368457
user3368457

Reputation:

tesseract OCR - Q detected as O

I am developing an application to read an identification badge using OpenCV and tesseract as OCR engine. I wrote an algorithm using OpenCV which handles with the text detection in order to get a clear and "easy-to-read" image for my OCR engine. I add an image below to ilustrate what I get:

enter image description here

When I ask tesseract to "read" the image, I get "KO 978"... Searching for this "O/Q problem" with tesseract, I found only this post https://groups.google.com/forum/#!topic/tesseract-issues/kEDIIpQ-9W4, but here, it seems that the is that the input image for tesseract is not preprocessed clearly (the reponse is that the image was not deskewed)...

Based on the wiki section at github, I followed all the step of the Improve Quality (and I think that the image is clear enought), so I do not know what else I can do... I do not know if training the OCR will help, but if it is possible, I want to avoid doing this beacuse of the hard work and because is not recommended in the documentation.

I am using tesseract v3.03 in console, not integrated in my app (so the tessarct make a preprocess of the input image).

Any idea of how to solve this? Thanks!

Upvotes: 1

Views: 1652

Answers (1)

Eren V.
Eren V.

Reputation: 210

You can train your language file for improve accuracy. This article will help you for training

While you are training for tesseract language file you pay attention unicharambigs file

Another opinion you can make preprocessing like binarization/thresholding on image.

Upvotes: 1

Related Questions