Sreekiran A R
Sreekiran A R

Reputation: 3421

detect checkboxes from a form using opencv python

given a dental form as input, need to find all the checkboxes present in the form using image processing. I have answered my current approach below. Is there any better approach to find the checkboxes for low-quality docs as well?

sample input:

masked input image

Upvotes: 4

Views: 4868

Answers (2)

Sreekiran A R
Sreekiran A R

Reputation: 3421

This is one approach in which we can solve the issue,

import cv2
import numpy as np
image=cv2.imread('path/to/image.jpg')

### binarising image
gray_scale=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
th1,img_bin = cv2.threshold(gray_scale,150,225,cv2.THRESH_BINARY)

binary

Defining vertical and horizontal kernels

lineWidth = 7
lineMinWidth = 55
kernal1 = np.ones((lineWidth,lineWidth), np.uint8)
kernal1h = np.ones((1,lineWidth), np.uint8)
kernal1v = np.ones((lineWidth,1), np.uint8)

kernal6 = np.ones((lineMinWidth,lineMinWidth), np.uint8)
kernal6h = np.ones((1,lineMinWidth), np.uint8)
kernal6v = np.ones((lineMinWidth,1), np.uint8)

Detect horizontal lines

img_bin_h = cv2.morphologyEx(~img_bin, cv2.MORPH_CLOSE, kernal1h) # bridge small gap in horizonntal lines
img_bin_h = cv2.morphologyEx(img_bin_h, cv2.MORPH_OPEN, kernal6h) # kep ony horiz lines by eroding everything else in hor direction

horizontal

finding vertical lines

## detect vert lines
img_bin_v = cv2.morphologyEx(~img_bin, cv2.MORPH_CLOSE, kernal1v)  # bridge small gap in vert lines
img_bin_v = cv2.morphologyEx(img_bin_v, cv2.MORPH_OPEN, kernal6v)# kep ony vert lines by eroding everything else in vert direction

vertical image

merging vertical and horizontal lines to get blocks. Adding a layer of dilation to remove small gaps

### function to fix image as binary
def fix(img):
    img[img>127]=255
    img[img<127]=0
    return img

img_bin_final = fix(fix(img_bin_h)|fix(img_bin_v))

finalKernel = np.ones((5,5), np.uint8)
img_bin_final=cv2.dilate(img_bin_final,finalKernel,iterations=1)

final binary

Apply Connected component analysis on the binary image to get the blocks required.

ret, labels, stats,centroids = cv2.connectedComponentsWithStats(~img_bin_final, connectivity=8, ltype=cv2.CV_32S)

### skipping first two stats as background
for x,y,w,h,area in stats[2:]:
    cv2.rectangle(image,(x,y),(x+w,y+h),(0,255,0),2)

final image

Upvotes: 13

Rahul Kedia
Rahul Kedia

Reputation: 1430

You can also use contours for this problem.

# Reading the image in grayscale and thresholding it
Image = cv2.imread("findBox.jpg", 0)
ret, Thresh = cv2.threshold(Image, 100, 255, cv2.THRESH_BINARY)

Now perform dilation and erosion twice to join the dotted lines present inside the boxes.

kernel = np.ones((3, 3), dtype=np.uint8)
Thresh = cv2.dilate(Thresh, kernel, iterations=2)
Thresh = cv2.erode(Thresh, kernel, iterations=2)

Find contours in the image with cv2.RETR_TREE flag to get all contours with parent-child relations. For more info on this.

Contours, Hierarchy = cv2.findContours(Thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)

Now all the boxes along with all the alphabets in the image are detected. We have to eliminate the alphabets detected, very small contours(due to noise), and also those boxes which contain smaller boxes inside them.

For this, I am running a for loop iterating over all the contours detected, and using this loop I am saving 3 values for each contour in 3 different lists.

  • 1st value: Area of contour(Number of pixels a contour encloses)
  • 2nd value: Contour's bounding rectangle info.
  • 3rd value: Ratio of area of contour to the area of its bounding rectangle.
Areas = []
Rects = []
Ratios = []
for Contour in Contours:
    # Getting bounding rectangle
    Rect = cv2.boundingRect(Contour)

    # Drawing contour on new image and finding number of white pixels for contour area
    C_Image = np.zeros(Thresh.shape, dtype=np.uint8)
    cv2.drawContours(C_Image, [Contour], -1, 255, -1)
    ContourArea = np.sum(C_Image == 255)

    # Area of the bounding rectangle
    Rect_Area = Rect[2]*Rect[3]
    
    # Calculating ratio as explained above
    Ratio = ContourArea / Rect_Area
   
    # Storing data
    Areas.append(ContourArea)
    Rects.append(Rect)
    Ratios.append(Ratio)

Filtering out undesired contours:

  • Getting indices of those contours which have an area less than 3600(threshold value for this image) and which have Ratio >= 0.99. The ratio defines the extent of overlap of contour to its bounding rectangle. As in this case, the desired contours are rectangle in shape, this ratio for them is expected to be "1.0" (0.99 for keeping a threshold of small noise).
BoxesIndices = [i for i in range(len(Contours)) if Ratios[i] >= 0.99 and Areas[i] > 3600]
  • Now final contours are those among contours at indices "BoxesIndices" which do not have a child contour(this will extract innermost contours) and if they have a child contour, then this child contour should not be one of the contours at indices "BoxesIndices".
FinalBoxes = [Rects[i] for i in BoxesIndices if Hierarchy[0][i][2] == -1 or BoxesIndices.count(Hierarchy[0][i][2]) == 0]

Final output image

Upvotes: 1

Related Questions