Andresnex
Andresnex

Reputation: 87

pytesseract can't recognise digits from a image,

The image I'm trying to analyze is the following:

enter image description here

I'm running this code:

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

my_image = 'C:\\autobot_wwe_supercard\\imagenes\\codigo_arriba.png'
text = pytesseract.image_to_string(Image.open(my_image))

print(text)

The result that is giving me is:

enter image description here

I have installed pytesseract by console with pip install pytesseract.

Upvotes: 0

Views: 374

Answers (2)

Tarun Chakitha
Tarun Chakitha

Reputation: 416

>>> img = cv2.imread("1299.png")
>>> gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>>> thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
>>> thresh = 255 - thresh
>>> data = pytesseract.image_to_string(thresh, config='--psm 11 digits')
>>> data
'1299'
>>>

Try whitelisting digits in the configuration. pytesseract is capable of extracting white text on black background too sometimes.

Upvotes: 1

akms
akms

Reputation: 51

pytesseract is not the best choice. Try to put some padding around text when you crop the region of interest.

Upvotes: 0

Related Questions