Reputation: 4327
I recently followed some tutorials to setup Tesseract and now I am trying to see if the OCR is working properly. When I take a picture and get the text I am sometimes getting Non English characters. It actually seems like gibberish. I have posted an example of an output I got below:
; .'—--~_~:~ ear
.::§—‘.::~__>‘Z~r'.‘ ,::-SES‘:3£a"3'§_“5.E.~ °?®.=_-
.—_;%~‘=*c§u-5; H =—oc+-»o cn-5 '55:.
The picture I took was the first page from the research article in this link. I'm not sure why this is happening. I have the eng.traineddata file within the tessdata sub directory as well.
Upvotes: 1
Views: 524
Reputation: 7156
there are two things that come to my mind:
For the editing I can recommend ImageMagic.
Upvotes: 1