Kaushik
Kaushik

Reputation: 1339

Find the coordinate of a specific text in an image

I am trying to segment the questions in the below image. The only clue I have is the number with the bold text which is indented by a tab space. I am trying to find the bold numbering (4,5,6 in this case) so that I can get the x and y of them and segment the image into 3 separate questions. How to get these or how to approach this problem.

I am using scikit image for image processing

enter image description here

Upvotes: 2

Views: 5747

Answers (1)

flamelite
flamelite

Reputation: 2854

Your image looks quite simple so texts can be segmented quite easily with contour detection around the dilated components. Here are detailed steps:

1) Binarize the image and invert it for easy morphological operations.

2) Dilate the image in horizontal directions only using long horizontal kernal say (20, 1) shape kernal.

3) Find contours of all the connected components and get their coordinates.

4) Use these bounding boxes dimensional information and their coordinates to segment the questions.

Here is the Python implementation of the same:

# Text segmentation 
import cv2
import numpy as np

rgb = cv2.imread(r'D:\Image\st4.png')
small = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY)

#threshold the image
_, bw = cv2.threshold(small, 0.0, 255.0, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)

# get horizontal mask of large size since text are horizontal components
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (20, 1))
connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel)

# find all the contours
_, contours, hierarchy,=cv2.findContours(connected.copy(),cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

#Segment the text lines
for idx in range(len(contours)):
    x, y, w, h = cv2.boundingRect(contours[idx])
    cv2.rectangle(rgb, (x, y), (x+w-1, y+h-1), (0, 255, 0), 2)

Output image: enter image description here

Upvotes: 3

Related Questions