Ashwin Sabapathy
Ashwin Sabapathy

Reputation: 11

How to detect multiple blocks of text from an image of a document?

I have images of Math question papers which have multiple questions per page. Example: Math questions image I want to use Python to extract the contents of each question separately and store them in a database table. From my research, I have a rough idea for my workflow: Pre-process image --> Find contours of each question --> Snip and send those individual images to pyTesseract --> Store the transcribed text.

I was very happy to find a great thread about a similar problem, but when I tried that approach on my image, the ROI that was identified covered the whole page. In other words, it identified all the questions as one block of text.

How do I make OpenCV recognize multiple ROIs within a page and draw bounding boxes? Is there something different to be done during the pre-processing?

Please suggest an approach - thanks so much!

Upvotes: 1

Views: 1869

Answers (1)

Sivaram Rasathurai
Sivaram Rasathurai

Reputation: 6333

  1. First you need to convert the image into grayscale

  2. Perform otsu'threshold which does better binarization in removing the background.

  3. Specify structure shape and kernel size. Kernel size increases or decreases the area of the rectangle to be detected.

  4. Applying dilation on the threshold image with the kernel when you dilated it gets thicker.

  5. Finding contours

  6. Looping through the identified contours Then the rectangular part is can be drawn using cv2.rectangle method

import cv2
img = cv2.imread("text.jpg") 
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
blur = cv2.GaussianBlur(gray,(5,5),0)

ret, thresh1 = cv2.threshold(blur, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV) 


rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18)) 

dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1) 

contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, 
                                                cv2.CHAIN_APPROX_NONE) 

for cnt in contours: 
    x, y, w, h = cv2.boundingRect(cnt) 
    
    # Drawing a rectangle on copied image 
    rect = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) 
    
cv2.imwrite('drawed.png', img)

Sample output iamge r

Upvotes: 1

Related Questions