LJ247
LJ247

Reputation: 21

How to OCR a simple digit using python (EasyOCR, Tesseract, etc.)?

I started with python recently and decided that the best way to learn it is by solving a real problem rather than just following tutorials. I'm trying to write a python program that will help me with with Nonograms using a webcam capturing my smartphone's screen.

I have most of the parts done however, I'm struggling with OCR. Let me show you what I have so far.

I'm using opencv to read the camera (it is held in a 3D printed arm, it has the same distance from phone's screen and pretty much the same lighting conditions). Then I preprocess it (cv2.COLOR_BGR2GRAY, cv2.GaussianBlur, cv2.adaptiveThreshold, etc.) and find contours (cv2.findContours), check if they are closed polygons, have 4 corners and have at least 30% area of the screen (to prevent smaller items to be picked up). Here is the result:

webcam_processed

Then I extract the selection, warp it (to get a nice rectangular just in case if the camera/phone were tilted) and use a similar approach to find contours and split the image into 4 sections:

Let's focus on the vertical rules. I process them again horizontally (GaussianBlur > 1000 to get the numbers dissapear but to leave the lines) and vertically, apply threshold and do contour detection to find number of columns and rows (in this example, it is 15x5). Here are the results:

vertical_rules

After that I have coordinates for all the squares in a 2-dimensional array. I do the same exercise for the horizontal rules and the play area. Finally, I extract all small squares from the image and store them in an array.

And now, the fun begins: I loop through the extracted images and apply an OCR (so far EasyOCR works better than Tesseract). The result goes into a same-shaped array but as extracted text. After some processing (i.e. only numbers, correct count, totals from both rules add up, etc.) I get 2 arrays like this:

vertical rules:   [[2], [1], [1, 1], [3, 2], [8], [6], [1, 4], [1, 1], [1], [1]]
horizontal rules: [[2], [2], [9], [1, 3], [1, 3], [2], [1, 1], [1, 1], [1, 1], [2, 2]]

I pass it to a nonogram solver class that gives me an output in a form of an array that holds 0/1 (only black & white for the moment) representing the solved nonogram. I use it to highlight cells in the original camera view to show the solution:

solution_dog

However, with smaller fonts (like in the original image (15x15 board) or even in some 10x10 boards) the OCR can't handle the text. Depending on the font, most of the time 1 is mixed with 7, 8 with 0, sometimes 4 with 0, etc. I tried various preprocessings to extract edges, blur the number, etc. but the results weren't great.

How can I get the OCR giving me better results? Please see some full-resolution (the camera gives 1920x1080 frame) extracted numbers from the rules section (I will attach more once I get more reputation points :)):

1 3 4 7 11

If you could help me finding a better way of extracting text (only numbers are possible, 0..30 is more than enough) from the attached squares above - it would be great!

Upvotes: 1

Views: 3665

Answers (2)

jun huang
jun huang

Reputation: 1

According to my tests, PaddleOCR works well in most case.

Upvotes: 0

LJ247
LJ247

Reputation: 21

Thanks @nathancy.

I ended up with the following function that accepts the cut-out square with a single rule, processes it and returns a cleaner version for the OCR:

def clean_square_for_OCR(image):
    grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(grey, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

    # Morph open to remove noise
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)

    # Find contours and remove small noise
    cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        area = cv2.contourArea(c)
        if area < 50:
            cv2.drawContours(opening, [c], -1, 0, -1)

    # Invert and apply slight Gaussian blur
    result = 255 - opening
    result = cv2.GaussianBlur(result, (5, 5), 0)

    return result

And then, within the main code I had to tweak the Tesseract parameters such a "--psm 8" (treat the image as a single word), possible character set and finally remove the new line if no text was found (blank square):

processed_square = wpi.clean_square_for_OCR(square)
ocr_result = pytesseract.image_to_string(processed_square, lang='eng', config='--psm 8 -c tessedit_char_whitelist=0123456789').replace('\n\x0c', '')

Upvotes: 1

Related Questions