Tesseract returns non English characters

Question

I recently followed some tutorials to setup Tesseract and now I am trying to see if the OCR is working properly. When I take a picture and get the text I am sometimes getting Non English characters. It actually seems like gibberish. I have posted an example of an output I got below:

 ; .'—--~_~:~ ear
 .::§—‘.::~__>‘Z~r'.‘ ,::-SES‘:3£a"3'§_“5.E.~ °?®.=_-
 .—_;%~‘=*c§u-5; H =—oc+-»o cn-5 '55:.

The picture I took was the first page from the research article in this link. I'm not sure why this is happening. I have the eng.traineddata file within the tessdata sub directory as well.

Tesseract returns non English characters

Answers (1)

Related Questions