Using Tesseract to OCR matchTemplate Regions of Interest (ROI)

Question

This is my very first attempt at using Python. I normally use .NET, but to identify shapes in documents have turned to Python and OpenCV for image processing.

I am using OpenCV TemplateMatching (cv2.matchTemplate) to discover Regions of Interest (ROI) in my documents.

This works well. The template matches the ROI's and rectangles are placed, identifying the matches.

The ROI's in my images contain text which I also need to OCR and extract. I am trying to do this with Tesseract, but I think I am approaching it wrongly, based upon my results.

My process is this:

Run cv2.matchTemplate
Loop through matched ROI's
Add rectangle info. to image
Pass rectangle info. to Tesseract
Add text returned from tesseract to image
Write the final image

In the image below, you can see the matched regions (which are fine), but you can see that the text in the ROI doesn't match the text from tesseract (bottom right of ROI).

Please could someone take a look and advise where I am going wrong?

import cv2
import numpy as np
import pytesseract
import imutils

img_rgb = cv2.imread('images/pd2.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

template = cv2.imread('images/matchMe.png', 0)
w, h = template.shape[::-1]

res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.45
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
    cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
    roi = img_rgb[pt, (pt[0] + w, pt[1] + h)]
    config = "-l eng --oem 1 --psm 7"
    text = pytesseract.image_to_string(roi, config=config)
    print(text)
    cv2.putText(img_rgb, text, (pt[0] + w, pt[1] + h),
                cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)

cv2.imwrite('images/results.png', img_rgb)

Knight Forked · Accepted Answer

There were two issues in your code: 1. You were modifying image (drawing rect) before OCR. 2. roi was not properly constructed.

img_rgb = cv2.imread('tess.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)

template = cv2.imread('matchMe.png', 0)
w, h = template.shape[::-1]

res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.45
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
    roi = img_rgb[pt[1]:pt[1] + h, pt[0]: pt[0] + w]
    config = "-l eng --oem 1 --psm 7"
    text = pytesseract.image_to_string(roi, config=config)
    print(text)
    cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
    cv2.putText(img_rgb, text, (pt[0] + w, pt[1] + h),
                cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)

cv2.imwrite('results.png', img_rgb)

You might still have to feed tesseract even properly filtered image for any meaningful recognition. Hope this helps.

Using Tesseract to OCR matchTemplate Regions of Interest (ROI)

Answers (1)

Related Questions