Reputation: 5412
so I am using opencv to do template matching like below. I constantly need to fiddle with the visual similarity #THRESHOLD
, because it fails to discover matches sometimes or it returns way too many matches. It's a trial and error until it matches exactly 1 element in a position in a document. I'm wonder if there is any way to automate this somehow.
the image.png file is a picture of a pdf document. the template.png file is a picture of paragraph. My goal is to discover all the paragraphs in the pdf document and I want to know what neural network is useful here.
import cv2
import numpy as np
img = cv2.imread("image.png");
gimg = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.imread("template.png", cv2.IMREAD_GRAYSCALE);
w, h = template.shape[::-1]
result = cv2.matchTemplate(gimg, template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.36) #THRESHOLD
print(loc)
for pt in zip(*loc[::-1]):
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0,255,0), 3)
cv2.imwrite("output.png", img)
so for instance, it will search for every #THRESHOLD
value from 0
to 1.0
and return a threshold value that returns a single rectangle match (draws green box above) in the image.
However, I can't help but feel this is very exhuastive, or is there a smarter way to find out what the threshold value is?
Upvotes: 4
Views: 8112
Reputation: 842
Since there were lots of comments and hardly any responses, I will summarize the answers for future readers.
First off, your question is almost identical to How to detect paragraphs in a text document image for a non-consistent text structure in Python. Also this thread seems to address the problem you are tackling: Easy ways to detect and crop blocks (paragraphs) of text out of image?
Second, detecting paragraphs in a PDF should not be done with template matching but with one of the following approaches:
The key in any OCR task is to simplify the text detection problem as much as possible by removing disruptive features of the image by altering the image as needed. The more information you have about the image you are processing beforehand the better: change colors, binarize, threshold, dilate, apply filters, etc.
To answer your question on finding the best match in template matching: Checkout nathancy's answer on template matching. In essence, it comes down to finding the maximum correlation value using minMaxLoc. See this excerpt from Nathancy's answer:
# Threshold resized image and apply template matching thresh = cv2.threshold(resized, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] detected = cv2.matchTemplate(thresh, template, cv2.TM_CCOEFF) (_, max_val, _, max_loc) = cv2.minMaxLoc(detected) ```
Also, a comprehensive guide extracting text blocks from an image (without using template matching) can be found in nathancy's answer in this thread.
Upvotes: 2
Reputation: 144
I would just have changed
loc = np.where(result == np.max(result))
this gives me the best matching positions, and then I can choose only one if I want to...
Upvotes: 0