Reputation: 732
I have the image created from old fax document (the font is specific) Generally Tesseract works pretty ok with this input, except one use case. When the line starts with many leading asterisk '*' than it is ignored.
The result produces by ocr is different for given psm
For queries please contact NA, KRKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KK KK
In every use case the tittle "comment" is skipped
But when I manually in Paint removed the all '*' from image then the ocr works fine. I ve no idea how to process the ocr without image preprocessing. Can someone understand it?
Upvotes: 0
Views: 268
Reputation: 3328
Try this: tesseract 9UIKs.png - --psm 4 --oem 0
Which produces:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxkkkk COMMENT kkkkxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
For queries p1ease contact NA.
XXXXXKKKKXXXVXKKKKKXXXXXKXXXXXXXXXXXXKXXXXXXXXXXXXXXXXKKXXXXXXXXXXXXXXXXXKXXXX.
You will need language model with support for legacy engine (from here https://github.com/tesseract-ocr/tessdata)
Upvotes: 1