Reputation: 11
testimg.png
:
I am trying to detect the Text in this image and it gives me nothing, it works for some other text barely. It gives me some of the word replaced with random letters and i have to compare it to a word list to get the right one
This is my image i am trying to detect the text ive tried tons of stuff with greyscale and inversion using tesseract but nothing seems to work it keeps giving me nothing. How can I fix it or train it?
Here is my current code
pyautogui.screenshot("testimg.png", region=(1050, 30, 455, 50)) #name
originalImage = cv2.imread('testimg.png')
# Convert the image to grayscale
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
# Apply a threshold to get a binary image
(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)
custom_config = r'--psm 7'
text = pytesseract.image_to_string(blackAndWhiteImage, config=custom_config)
print('Extracted Text: ', text)
Upvotes: 0
Views: 131
Reputation: 2083
You can try PaddleOCR to extract the text without any pre-processing (for this case). Install the library via:
pip install paddlepaddle, paddleocr # install paddlepaddle-gpu if you have GPU
Then for your reference image:
You can use the following snippet to extract the text:
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang='en') # You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
img_path = '/path/to/test_img.png'
result = ocr.ocr(img_path, cls=True)
print(result) # The result is a list, each item contains a text box, text and recognition confidence
which prints out:
[[[[[76.0, 13.0], [352.0, 8.0], [353.0, 39.0], [77.0, 44.0]], ('TreasureHunter VII', 0.9374205470085144)]]]
Additionally, if you apply binary thresholding as you did in your code, you may get slightly better result:
import cv2
img_path = '/path/to/test_img.png'
originalImage = cv2.imread(img_path)
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)
cv2.imwrite('gray.jpg', blackAndWhiteImage)
Image after binary thresholding:
And then use PaddleOCR:
from paddleocr import PaddleOCR, draw_ocr
blackAndWhiteImage_path = '/path/to/gray.jpg'
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr(blackAndWhiteImage_path, cls=True)
print(result) # You would notice now it correctly extracts all the Roman numerals
which prints out:
[[[[[77.0, 13.0], [353.0, 8.0], [354.0, 37.0], [78.0, 42.0]], ('TreasureHunterVIII', 0.9578518271446228)]]]
Upvotes: 2