Igor Markovic
Igor Markovic

Reputation: 185

Why does pytesseract not recognize correctly?

Ok so I've been trying to change my image to whatever works, but I cannot seem to find the right settings..

This is the image: enter image description here

As you can see picture is already as simple as anything, but it still cannot recognize '1 BB' from the Image.. Any tips?

img = Image.fromarray(img)
imp_arr = np.asarray(img)
imp_arr = (np.floor(imp_arr / 140.0) * 255.0).astype('uint8')
img = Image.fromarray(imp_arr, mode='L')
width, height = img.size 
img = img.resize((width*3, height*3), Image.BICUBIC)
width, height = img.size 
img = img.resize((width*2, height*2), Image.HAMMING)
width, height = img.size 
img = img.resize((int(width*0.3), int(height*0.3)), Image.BICUBIC)
img = ImageEnhance.Brightness(img).enhance(0.7)
img = ImageEnhance.Sharpness(img).enhance(2)
img = ImageEnhance.Contrast(img).enhance(2)
amount = pytesseract.image_to_string(img, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

This is just an example, of what I've tried to adjust it correctly to get the correct text to string. Some of the times it works other times it prints out gibberish. The thing is.. It needs to work every single time, expecially for a picture as clear as this one. Is there a mastermind who has a simple solution to this problem? Thank you in advance.

Upvotes: 1

Views: 1141

Answers (1)

Pierre
Pierre

Reputation: 1099

After installing Tesseract OCR, Pillow and pytesseract, I saved your image as igor.png and ran the following code, which I found in the docs of pytesseract:

#!/usr/bin/env python

from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open("igor.png")))

It prints the expected result:

1BB

If I correct a bit your initial code by adding the letter B to the tessedit_char_whitelist, it works as well.

Upvotes: 1

Related Questions