(Py)Tesseract failing to read text from simple image

Question

PyTesseract (tesseract 4.0) simply refuses to spit out any prediction whatsoever, no matter what -psm value I use, no matter how large or small the image is sized, whether I use Gaussian and/or median blurs. I've tried most everything that I've read could improve the image for recognition, even using a .traineddata file that was made with the EXACT font in the picture.

What else can I do? This seems to be a pretty simple image to read from... Am I doing something stupid?

Excerpt (excluding some attempts at blur):

import cv2
import pytesseract

def load(name):
    return cv2.imread('resources/' + name)
img = load('2048.png')
img = cv2.resize(img, (1500, 1500))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(img, 220, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('f', thresh)
print(pytesseract.image_to_string(thresh, lang='Clear', config='-psm 7'))
while True:
    if cv2.waitKey(0) == ord('q'):
        break
cv2.destroyAllWindows()

Clear is my .traineddata file, I've tried eng as well. As previously stated, I've tried all psm configurations too.

Adam Brewer · Accepted Answer

I have solved it on my own. The issue was the fact that the image was too large. I had been under the impression that the bigger the better, as from what I was reading that seemed to be true, but decided to reduce size to see if that was an issue. It was! Everything works perfectly now.

(Py)Tesseract failing to read text from simple image

Answers (1)

Related Questions