Reputation: 11
I have images of Math question papers which have multiple questions per page. Example:
Math questions image
I want to use Python to extract the contents of each question separately and store them in a database table. From my research, I have a rough idea for my workflow: Pre-process image --> Find contours of each question --> Snip and send those individual images to pyTesseract --> Store the transcribed text.
I was very happy to find a great thread about a similar problem, but when I tried that approach on my image, the ROI that was identified covered the whole page. In other words, it identified all the questions as one block of text.
How do I make OpenCV recognize multiple ROIs within a page and draw bounding boxes? Is there something different to be done during the pre-processing?
Please suggest an approach - thanks so much!
Upvotes: 1
Views: 1869
Reputation: 6333
First you need to convert the image into grayscale
Perform otsu'threshold which does better binarization in removing the background.
Specify structure shape and kernel size. Kernel size increases or decreases the area of the rectangle to be detected.
Applying dilation on the threshold image with the kernel when you dilated it gets thicker.
Finding contours
Looping through the identified contours Then the rectangular part is can be drawn using cv2.rectangle method
import cv2
img = cv2.imread("text.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
ret, thresh1 = cv2.threshold(blur, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))
dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1)
contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_NONE)
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
# Drawing a rectangle on copied image
rect = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imwrite('drawed.png', img)
Upvotes: 1