Adam Brewer
Adam Brewer

Reputation: 143

(Py)Tesseract failing to read text from simple image

Sample Image

PyTesseract (tesseract 4.0) simply refuses to spit out any prediction whatsoever, no matter what -psm value I use, no matter how large or small the image is sized, whether I use Gaussian and/or median blurs. I've tried most everything that I've read could improve the image for recognition, even using a .traineddata file that was made with the EXACT font in the picture.

What else can I do? This seems to be a pretty simple image to read from... Am I doing something stupid?

Excerpt (excluding some attempts at blur):

import cv2
import pytesseract

def load(name):
    return cv2.imread('resources/' + name)
img = load('2048.png')
img = cv2.resize(img, (1500, 1500))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(img, 220, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('f', thresh)
print(pytesseract.image_to_string(thresh, lang='Clear', config='-psm 7'))
while True:
    if cv2.waitKey(0) == ord('q'):
        break
cv2.destroyAllWindows()

Clear is my .traineddata file, I've tried eng as well. As previously stated, I've tried all psm configurations too.

Upvotes: 1

Views: 379

Answers (1)

Adam Brewer
Adam Brewer

Reputation: 143

I have solved it on my own. The issue was the fact that the image was too large. I had been under the impression that the bigger the better, as from what I was reading that seemed to be true, but decided to reduce size to see if that was an issue. It was! Everything works perfectly now.

Upvotes: 3

Related Questions