Correctly extract text from image using Tesseract OCR

Question

I have been trying to extract the bold white text from this image but not able to get it working correctly, seems the 9 is read as a 3 and the I as 1.

Have been looking at various sites which has code to make the image better quality but not getting it to work, anyone able to help me with this one? The desired output should be "I6M-9U"

def get_text_from_image(image: cv2.Mat) -> str:
    pytesseract.pytesseract.tesseract_cmd = r'C:\Tesseract-OCR	esseract.exe'
    
    # Crop image to only get the piece I am interested in
    top, left, height, width = 25, 170, 40, 250

    try:
        crop_img = image[top:top + height, left:left + width]
        
        # Make it bigger
        resize_scaling = 1500
        resize_width = int(crop_img.shape[1] * resize_scaling / 100)
        resize_height = int(crop_img.shape[0] * resize_scaling / 100)
        resized_dimensions = (resize_width, resize_height)
    
        # Resize it
        crop_img = cv2.resize(crop_img, resized_dimensions, interpolation=cv2.INTER_CUBIC)
        
        return str(pytesseract.image_to_string(crop_img, config="--psm 6"))

UPDATED CODE

ret, thresh1 = cv.threshold(image, 120, 255, cv.THRESH_BINARY +
                                            cv.THRESH_OTSU)

cv.imshow("image", thresh1)

This now has all the background artifacts removed but it is now reading the first letter I as 1 and the 9 is read as 3

Correctly extract text from image using Tesseract OCR

Answers (1)

Related Questions