Sky withoutstars
Sky withoutstars

Reputation: 11

Improving the output of tesseract while detecting phone numbers

I came across the problem of reading phone numbers in images. I tried to detect them in an image using the tesseract, but sometimes it gives me a wrong answer. For example, the number is 8 995 005-81-86, but tesseract gives me 8 995 0005-81-86 as an output. How can I fix it? Maybe binarizing?


Code is basic

import pytesseract as pt
from PIL import Image

img = Image.open('1.png')
number = pt.image_to_string(img)

print(number)

https://i.sstatic.net/kvhAq.png

Upvotes: 1

Views: 304

Answers (1)

K41F4r
K41F4r

Reputation: 1551

You should pass a black on white text for best results:

import cv2
from PIL import Image

img = cv2.imread('kvhAq.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY)
im = Image.fromarray(thresh.astype("uint8"))
print(pytesseract.image_to_string(im))

enter image description here

Upvotes: 1

Related Questions