How to help Tesseract identify the character in this simple image?

Question

Here is the link to the original image I wanted to process:-

https://i.sstatic.net/opbz8.jpg

After I processed the images using opencv2 I got the following result:-

https://i.sstatic.net/XCH5O.jpg

But even with the above image Tesseract is unable to recognize the character in the image. And this happens in a lot of images having the same style as the above example.

Any suggestions on how to improve the quality of the image or use some other mode of Tesseract would be most welcome.

Also if the above techniques wouldn't work kindly suggest an alternative such as training Tesseract or using some other OCR or method?

Thank you

Edit: I am including the code as well

        # Read the image
        im = cv2.imread("image.jpg")
        # Convert to grayscale and apply Gaussian filtering
        im_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
        im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
        # Threshold the image
        ret, im_th = cv2.threshold(im_gray, 90, 255, cv2.THRESH_BINARY_INV)

        # Find contours in the image
        ctrs, hier = cv2.findContours(im_th.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

        # Get rectangles contains each contour
        rects = [cv2.boundingRect(ctr) for ctr in ctrs]
        for rect in rects:
            # Only consider rects which are bigger than a certain area
            if rect[3]*rect[2] > 300:
                # Draw the rectangles
                    cv2.rectangle(im, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 255, 0), 3) 
                    # Make the rectangular region around the digit
                    leng = int(rect[3] * 1.6)
                    pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
                    pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
                    if pt2 < 0:
                        pt2 = rect[0] + rect[2]
                    roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
                    # Invert the image such that the text is black and background is white
                    roi = (255-roi)
                    # roi is the final processed image
                    try:
                        cv2.imwrite("test.jpg", roi)
                        # call the terminal command: tesseract test.jpg out -psm 10
                        call(["tesseract", "test.jpg", "out", "-psm", "10"])
                        file = open('out.txt', 'rb+')
                        text = file.read()
                        file.close()
                        if text:
                            print text
                    except:
                        pass

How to help Tesseract identify the character in this simple image?

Answers (1)

Related Questions