naiveprogrammer
naiveprogrammer

Reputation: 11

Pytesseract won't read the number in the image

I have a small sized image file that was cropped and saved from an original larger image based on a matching criteria. I need to extract the data from this cropped image. But no matter what I try, I am unable to extract the text with pytesseract for this image. Is there something that I can try ?

import cv2 import pytesseract from pytesseract import Output

img = cv2.imread('rois/roi11.jpg') data = pytesseract.image_to_boxes(img, output_type=Output.DICT) print(data)

Small image with a digit

I have tried scaling up, applying thresholds on the image with no luck.

import cv2 
import pytesseract
img = cv2.imread('rois/roi11.jpg')
img2 = cv2.resize(img, (0, 0), fx=2, fy=2)
gry = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
data = pytesseract.image_to_string(thr)
print(data)

Upvotes: 1

Views: 61

Answers (1)

Antonio Abrantes
Antonio Abrantes

Reputation: 591

This code works for me:

config_tesseract = '--tessdata-dir tessdata --psm 7'
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
texto = pytesseract.image_to_string(thr, lang='por', config=config_tesseract)
print(texto)

Upvotes: 0

Related Questions