How to automatically adjust the threshold for template matching with opencv?

Question

so I am using opencv to do template matching like below. I constantly need to fiddle with the visual similarity #THRESHOLD, because it fails to discover matches sometimes or it returns way too many matches. It's a trial and error until it matches exactly 1 element in a position in a document. I'm wonder if there is any way to automate this somehow.

the image.png file is a picture of a pdf document. the template.png file is a picture of paragraph. My goal is to discover all the paragraphs in the pdf document and I want to know what neural network is useful here.

import cv2
import numpy as np


img = cv2.imread("image.png");
gimg = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.imread("template.png", cv2.IMREAD_GRAYSCALE);
w, h = template.shape[::-1]


result = cv2.matchTemplate(gimg, template, cv2.TM_CCOEFF_NORMED)

loc = np.where(result >= 0.36) #THRESHOLD
print(loc)

for pt in zip(*loc[::-1]):
        cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0,255,0), 3)

cv2.imwrite("output.png", img)

so for instance, it will search for every #THRESHOLD value from 0 to 1.0 and return a threshold value that returns a single rectangle match (draws green box above) in the image.

However, I can't help but feel this is very exhuastive, or is there a smarter way to find out what the threshold value is?

avgJoe · Accepted Answer

Since there were lots of comments and hardly any responses, I will summarize the answers for future readers.

First off, your question is almost identical to How to detect paragraphs in a text document image for a non-consistent text structure in Python. Also this thread seems to address the problem you are tackling: Easy ways to detect and crop blocks (paragraphs) of text out of image?

Second, detecting paragraphs in a PDF should not be done with template matching but with one of the following approaches:

Using the canny edge detector in combination with dilation and F1 Score optimization. This is often used for OCR as suggested by fmw42.
Alternatively, you could use Stroke Width Transform (SWT) to identify text which you then group into lines and finally blocks i.e. paragraphs. For OCR these blocks can then be passed to Tesseract (as suggested by fmw42)

The key in any OCR task is to simplify the text detection problem as much as possible by removing disruptive features of the image by altering the image as needed. The more information you have about the image you are processing beforehand the better: change colors, binarize, threshold, dilate, apply filters, etc.

To answer your question on finding the best match in template matching: Checkout nathancy's answer on template matching. In essence, it comes down to finding the maximum correlation value using minMaxLoc. See this excerpt from Nathancy's answer:

    # Threshold resized image and apply template matching
    thresh = cv2.threshold(resized, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    detected = cv2.matchTemplate(thresh, template, cv2.TM_CCOEFF)
    (_, max_val, _, max_loc) = cv2.minMaxLoc(detected) ```

Also, a comprehensive guide extracting text blocks from an image (without using template matching) can be found in nathancy's answer in this thread.

How to automatically adjust the threshold for template matching with opencv?

Answers (2)

Related Questions