User_123917425
User_123917425

Reputation: 1

Pytesseract numbers image to text

I am trying to use pytesseract to extract numbers from images.

It works for some of them (1, 2, 3, 5, 6, 20...) but I would like to make it work for all of them.

Here is a sample of the data that I'm using:

Example for 2

Example for 7

Example for 415

The images are very small (75x26) and the font is pretty poor but clearly human readable.

I tried to do some preprocessing because as it, it's not detecting much, but even with preprocessing I'm having trouble detecting numbers like 7, 11, 415...

I've tested so many things but here's a version that detects most of my numbers:

from PIL import Image, ImageEnhance

image = Image.open(f"{i}.png")
gray = image.convert("L")
enhancer = ImageEnhance.Contrast(gray)
image_enhanced = enhancer.enhance(2.0)
text = pytesseract.image_to_string(image_enhanced, config="--psm 6 -c tessedit_char_whitelist=0123456789")
print(text)

I've also tried to apply threshold using cv2 but without much more success despite what seem's to be cleaner numbers sometimes (1 not found, found 0 for 11, 3 for 9...)

Different 1's during preprocessing

Different 9's during preprocessing

Different 11's during preprocessing

img = cv2.imread(f'/content/{i}.png')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 170, 255, cv2.THRESH_BINARY)
text = pytesseract.image_to_string(thresh, config="--psm 6 -c tessedit_char_whitelist=0123456789")

Any more steps in preprocessing that I should add to get better results?

I know that I could "fine tune" pytesseract but it looks like a basic thing to do so I was wondering if I was doing it wrong somehow before losing more time...

Upvotes: 0

Views: 27

Answers (1)

Gonzalo Odiard
Gonzalo Odiard

Reputation: 1355

It is possible that you get beeter results with black numbers over a white background, see Why can't Pytesseract recognize plain white text on black? and image processing to improve tesseract OCR accuracy

Upvotes: 0

Related Questions