Reputation: 63
I'm trying to read the digits from this image:
Using pytesseract
with these settings:
custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(img, config=custom_config)
This is the output:
((E ST7 [71aT6T2 ] THETOGOG5 15 [8)
Upvotes: 2
Views: 2979
Reputation: 402
Whitelisting only integers, as well as changing your psm provides much better results. You also need to remove carriage returns, and white space. Below is code that does that.
import pytesseract
import re
from PIL import Image
#Open image
im = Image.open("numbers.png")
#Define configuration that only whitelists number characters
custom_config = r'--oem 3 --psm 11 -c tessedit_char_whitelist=0123456789'
#Find the numbers in the image
numbers_string = pytesseract.image_to_string(im, config=custom_config)
#Remove all non-number characters
numbers_int = re.sub(r'[a-z\n]', '', numbers_string.lower())
#print the output
print(numbers_int)
The result of the code on your image is: '31477423353'
Unfortunately, a few numbers are still missing. I tried some experimentation, and downloaded your image and erased the grid.
After removing the grid and executing the code again, pytesseract produces a perfect result: '314774628300558'
So you might try to think about how you can remove the grid programmatically. There are alternatives to pytesseract, but regardless you will get better output with the text isolated in the image.
Upvotes: 5