Unable to read white text on black background using pytesseract

Question

I'm trying to read and get the location of the white text on the black background using pytesseract, but am not having any success. Here is an example of an image that I'm dealing with.

Here is the code:

import cv2
import pytesseract
from pytesseract import Output

img = cv2.imread("ocr_example.png")
img = cv2.bitwise_not(img)
_, binary = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY)


custom_config = r'--oem 3 --psm 6'
d = pytesseract.image_to_data(binary, output_type=Output.DICT, config=custom_config)

print(d["text"])

Here is the output of the text found:

['', '', '', '', 'Home', 'Address', '', 'Use', 'Current', 'Location', '', '>', '', 'Unable', 'to', 'find', 'location']

If I save the white text on black background to its own file and scan, the text is found without a problem. But I need to get the location of the text on the image as a whole.

I've tried using many of the Preprocessing suggestions on sites like https://nanonets.com/blog/ocr-with-tesseract/, but nothing seems to work. I don't mind doing a second search that finds only the missing text.

Ahx · Accepted Answer

Home Address

Enter address here

Use Current Location

Unable to find location

 

DISMISS SAVE

Code:

import cv2
import pytesseract

img = cv2.imread("hm-adrs.png")
img = cv2.bitwise_not(img)
_, binary = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY)
txt = pytesseract.image_to_string(binary, config="--oem 3 --psm 4")
print(txt)

Unable to read white text on black background using pytesseract

Answers (2)

Related Questions