Alex W.
Alex W.

Reputation: 174

Cluster contours separated by whitespace (line segmentation)

I have large number of line snippet, e.g.:

line snippet

With some OpenCV magic (I'm still trying to understand how OpenCV works) I can get the contours of the characters no an empty canvas:

import cv2
import numpy as np
import matplotlib.pyplot as plt

img=cv2.imread(example)
imgGray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

empty = np.zeros(img.shape[0:2])

ret,imgThresh = cv2.threshold(imgGray,249,250,cv.THRESH_OTSU)
kernel_erosion = np.ones((5,5),np.uint8)
imgErode = cv2.erode(imgThresh,kernel_erosion,iterations = 2)

kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT,(1,3))
imgOpen = cv2.morphologyEx(imgErode, cv2.MORPH_OPEN, kernel_open)

kernel_dilate = np.ones((1,1),np.uint8)
imgDilate = cv2.dilate(imgOpen,kernel_dilate,iterations = 4)

contours,_ = cv2.findContours(imgDilate, cv.RETR_TREE, cv.CHAIN_APPROX_NONE )

new_img = cv.drawContours(empty, contours, -1, (255,255,255), thickness=cv2.FILLED)
plt.imshow(new_img)
plt.show()

line snippet with character contours

The whitespace-separated lines are – to a human eye – clearly distinguishable. I am looking for a straightforward way to select the different clusters of contours (viz. lines) either by selecting the whitspace in between the lines or by clustering contours that are sufficiently close to one another and are on the same line.

A pragmatical statistical approach which just counts the pixels in each row does not seem to be robust enough as lines can be skewed.

lines snippet with projection

Any ideas as to how to segment the line snippets would be very appreciated!

Upvotes: 0

Views: 407

Answers (1)

stateMachine
stateMachine

Reputation: 5805

Here's a possible solution. The idea is to reduce the image to a column, where all the row values are the sum of every intensity value across all rows. The areas between the text should exhibit the lower values, giving us the (approximate) location of the blank lines. These are the steps:

  1. Convert the image to grayscale
  2. Get a binary image via Otsu's Thresholding
  3. Reduce the image to a column that contains all the sum of every image's row
  4. Set a threshold and find the locations of the minimum sum values
  5. We expect to have multiple minimum locations, so we will get an average of the minimum location points
  6. Use this info to crop the image between both blank spaces

Let's check out the code:

# imports:
import cv2
import numpy as np

# Set image path
imagePath = "D://opencvImages//"
imageName = "b6yZO.png"

# Read image in Grasycale Mode:
inputImage = cv2.imread(imagePath + imageName, cv2.IMREAD_GRAYSCALE)

# Convert Grayscale to BGR:
inputImage = cv2.cvtColor(inputImage, cv2.COLOR_GRAY2BGR)

# Store a copy for results:
inputCopy = inputImage.copy()

# Convert BGR back to grayscale:
grayInput = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayInput, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

Now, your image seems to be already grayscale (watch out because depending on the kind of library you are using to process images, this could give you trouble). I load the image in GRAYSCALE mode. Now, I convert it to BGR to have a color copy (so I can draw some results later) and convert it back to grayscale to continue processing. The binary image I get is this:

Nothing fancy, just an ordinary binary image. Note that the black color is relatively constant across all rows in the blank paragraphs. Alright, next, reduce the image to a column using SUM mode:

# Reduce the ROI to a n row x 1 columns matrix:
reducedImg = cv2.reduce(binaryImage, 1, cv2.REDUCE_SUM, dtype=cv2.CV_32S)

This reduces the image. Watch out with the data types. This particular operation yields 32 bit signed integers to store the sums of all rows. Now, let's try to get the minimum areas. There will be multiple, as the original has multiple pixels that make up the blank spaces. I'll get the column's maximum value and set a threshold as a fraction of this value:

# Get the maximum element from the reduced image array:
maxElement = np.amax(reducedImg)

# Define a threshold and accumulate
# the coordinate of the points:
threshValue = 0.1 * maxElement

# Get the height (or lenght) of the array:
reducedHeight = reducedImg.shape[0]

# We will store the Y coordinate here:
Y = []

I've set the threshold to be 10% of the maximum sum value. Also, I've prepared some variables before traversing the image. reducedHeight is the length of the array and Y is a list that will store all the coordinates that are below the threshold. Let's loop through the array:

# Search for Y coordinates lower
# than the threshold:
for i in range(reducedHeight):
    # Get current value from column:
    currentValue = reducedImg[i]
    # Check out if the value is below the threshold:
    if currentValue < threshValue:
        # Store the value:
        Y.append(i)

Nice. We have all the points we need stored in Y. If we plot these points as lines, we can visualize the cluster of lines representing each paragraph. This is the image:

Now, as there are multiple lines, we need an average. In fact, there are two clusters in Y, each cluster represents a paragraph, and we have two paragraphs in the image. There are multiple ways of doing this, but at the end of the day, we need two average values. Seems like a job for K-Means, then, because it is exactly what it does - it receives data, it clusters data and it returns the average centers of said clusters. Let's apply K-Means to our Y array. But first, the array needs some data handling:

# Reshape the array for K-means
Y = np.array(Y)
Y = Y.reshape(-1,1)

# K-means operates on 32-bit float data:
floatPoints = np.float32(Y)

# Set the convergence criteria and call K-means:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(floatPoints, 2, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)

# Print the centers:
print(center)

Output:

[[24.000002]
 [95.      ]]

Yeah, those are our centers. One value for cluster 1 (paragraph 1) and another one for cluster 2 (paragraph 2). The cool thing about K-means is that there could be more paragraphs - and they will be clustered by K-means and it will always return the appropriate centers of said clusters. Alright, let's use our new info to check out our two lines:

# Draw the average lines:
for p in range(len(center)):

    # Get line points:
    x1 = 0
    y1 = int(center[p][0])
    x2 = int(inputCopy.shape[1])
    y2 = y1

    cv2.line(inputCopy, (x1, y1), (x2, y2), (0, 255, 0), 1)
    cv2.imshow("Lines", inputCopy)
    cv2.waitKey(0)

These are the centers (in green - the lines are there, but the image is too small to see them):

We can finally crop the image with this info:

# Crop image:
x = 0
y = int(center[0][0])
w = inputCopy.shape[1]
h = int(center[1][0])

imgCrop = inputImage[y:h,x:w]
cv2.imshow("imgCrop", imgCrop)
cv2.waitKey(0)

Which yields:

Upvotes: 4

Related Questions