philM
philM

Reputation: 11

Python tesseract OCR detecting text improperly

testimg.png:

1

I am trying to detect the Text in this image and it gives me nothing, it works for some other text barely. It gives me some of the word replaced with random letters and i have to compare it to a word list to get the right one

This is my image i am trying to detect the text ive tried tons of stuff with greyscale and inversion using tesseract but nothing seems to work it keeps giving me nothing. How can I fix it or train it?

Here is my current code

pyautogui.screenshot("testimg.png", region=(1050, 30, 455, 50)) #name

originalImage = cv2.imread('testimg.png')

# Convert the image to grayscale
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)

# Apply a threshold to get a binary image
(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)


custom_config = r'--psm 7'
text = pytesseract.image_to_string(blackAndWhiteImage, config=custom_config)
print('Extracted Text: ', text)

Upvotes: 0

Views: 131

Answers (1)

Ro.oT
Ro.oT

Reputation: 2083

You can try PaddleOCR to extract the text without any pre-processing (for this case). Install the library via:

pip install paddlepaddle, paddleocr # install paddlepaddle-gpu if you have GPU

Then for your reference image:

enter image description here

You can use the following snippet to extract the text:

from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang='en') # You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
img_path = '/path/to/test_img.png'
result = ocr.ocr(img_path, cls=True)
print(result) # The result is a list, each item contains a text box, text and recognition confidence

which prints out:

[[[[[76.0, 13.0], [352.0, 8.0], [353.0, 39.0], [77.0, 44.0]], ('TreasureHunter VII', 0.9374205470085144)]]]

Additionally, if you apply binary thresholding as you did in your code, you may get slightly better result:

import cv2

img_path = '/path/to/test_img.png'
originalImage = cv2.imread(img_path)
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)
cv2.imwrite('gray.jpg', blackAndWhiteImage)

Image after binary thresholding: enter image description here

And then use PaddleOCR:

from paddleocr import PaddleOCR, draw_ocr
blackAndWhiteImage_path = '/path/to/gray.jpg'
ocr = PaddleOCR(use_angle_cls=True, lang='en') 
result = ocr.ocr(blackAndWhiteImage_path, cls=True)
print(result) # You would notice now it correctly extracts all the Roman numerals

which prints out:

[[[[[77.0, 13.0], [353.0, 8.0], [354.0, 37.0], [78.0, 42.0]], ('TreasureHunterVIII', 0.9578518271446228)]]]

Upvotes: 2

Related Questions