Reputation: 1339
I am trying to segment the questions in the below image. The only clue I have is the number with the bold text which is indented by a tab space. I am trying to find the bold numbering (4,5,6 in this case) so that I can get the x and y of them and segment the image into 3 separate questions. How to get these or how to approach this problem.
I am using scikit image for image processing
Upvotes: 2
Views: 5747
Reputation: 2854
Your image looks quite simple so texts can be segmented quite easily with contour detection around the dilated components. Here are detailed steps:
1) Binarize the image and invert it for easy morphological operations.
2) Dilate the image in horizontal directions only using long horizontal kernal say (20, 1) shape kernal.
3) Find contours of all the connected components and get their coordinates.
4) Use these bounding boxes dimensional information and their coordinates to segment the questions.
Here is the Python implementation of the same:
# Text segmentation
import cv2
import numpy as np
rgb = cv2.imread(r'D:\Image\st4.png')
small = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY)
#threshold the image
_, bw = cv2.threshold(small, 0.0, 255.0, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
# get horizontal mask of large size since text are horizontal components
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (20, 1))
connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel)
# find all the contours
_, contours, hierarchy,=cv2.findContours(connected.copy(),cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
#Segment the text lines
for idx in range(len(contours)):
x, y, w, h = cv2.boundingRect(contours[idx])
cv2.rectangle(rgb, (x, y), (x+w-1, y+h-1), (0, 255, 0), 2)
Upvotes: 3