Parth Patel
Parth Patel

Reputation: 43

Why will tesseract not detect this letter?

I am trying to detect this letter but it doesn't seem to recognize it.

import cv2
import pytesseract as tess
img = cv2.imread("letter.jpg")
imggray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(tess.image_to_string(imggray))

this is the image in question:

A letter

Upvotes: 0

Views: 180

Answers (2)

Davide Fiocco
Davide Fiocco

Reputation: 5924

Preprocessing of the image (e.g. inverting it) should help, and also you could take advantage of pytesseract image_to_string config options.

For instance, something along these lines:

import pytesseract 
import cv2 as cv
import requests
import numpy as np
import io

# I read this directly from imgur
response = requests.get('https://i.sstatic.net/LGFAu.jpg') 
nparr = np.frombuffer(response.content, np.uint8)
img = cv.imdecode(nparr, cv.IMREAD_GRAYSCALE)
# simple inversion as preprocessing
neg_img = cv.bitwise_not(img)
# invoke tesseract with options
text = pytesseract.image_to_string(neg_img, config='--psm 7')

print(text)

should parse the letter correctly.

Have a look at related questions for some additional info about preprocessing and tesseract options:
Why does pytesseract fail to recognise digits from image with darker background?
Why does pytesseract fail to recognize digits in this simple image?
Why does tesseract fail to read text off this simple image?

Upvotes: 2

Ahx
Ahx

Reputation: 8005

@Davide Fiocco 's answer is definitely correct.

I just want to show another way of doing it with adaptive-thresholding

When you apply adaptive-thesholding result will be:

enter image description here

Now when you read it:

txt = pytesseract.image_to_string(thr, config="--psm 7")
print(txt)

Result:

B

Code:


import cv2
import pytesseract

img = cv2.imread("LGFAu.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 252, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 11, 2)
txt = pytesseract.image_to_string(thr, config="--psm 7")
print(txt)

Upvotes: 1

Related Questions