Reputation: 174
I have large number of line snippet, e.g.:
With some OpenCV magic (I'm still trying to understand how OpenCV works) I can get the contours of the characters no an empty canvas:
import cv2
import numpy as np
import matplotlib.pyplot as plt
img=cv2.imread(example)
imgGray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
empty = np.zeros(img.shape[0:2])
ret,imgThresh = cv2.threshold(imgGray,249,250,cv.THRESH_OTSU)
kernel_erosion = np.ones((5,5),np.uint8)
imgErode = cv2.erode(imgThresh,kernel_erosion,iterations = 2)
kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT,(1,3))
imgOpen = cv2.morphologyEx(imgErode, cv2.MORPH_OPEN, kernel_open)
kernel_dilate = np.ones((1,1),np.uint8)
imgDilate = cv2.dilate(imgOpen,kernel_dilate,iterations = 4)
contours,_ = cv2.findContours(imgDilate, cv.RETR_TREE, cv.CHAIN_APPROX_NONE )
new_img = cv.drawContours(empty, contours, -1, (255,255,255), thickness=cv2.FILLED)
plt.imshow(new_img)
plt.show()
The whitespace-separated lines are – to a human eye – clearly distinguishable. I am looking for a straightforward way to select the different clusters of contours (viz. lines) either by selecting the whitspace in between the lines or by clustering contours that are sufficiently close to one another and are on the same line.
A pragmatical statistical approach which just counts the pixels in each row does not seem to be robust enough as lines can be skewed.
Any ideas as to how to segment the line snippets would be very appreciated!
Upvotes: 0
Views: 407
Reputation: 5805
Here's a possible solution. The idea is to reduce the image to a column, where all the row values are the sum of every intensity value across all rows. The areas between the text should exhibit the lower values, giving us the (approximate) location of the blank lines. These are the steps:
Let's check out the code:
# imports:
import cv2
import numpy as np
# Set image path
imagePath = "D://opencvImages//"
imageName = "b6yZO.png"
# Read image in Grasycale Mode:
inputImage = cv2.imread(imagePath + imageName, cv2.IMREAD_GRAYSCALE)
# Convert Grayscale to BGR:
inputImage = cv2.cvtColor(inputImage, cv2.COLOR_GRAY2BGR)
# Store a copy for results:
inputCopy = inputImage.copy()
# Convert BGR back to grayscale:
grayInput = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayInput, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
Now, your image seems to be already grayscale (watch out because depending on the kind of library you are using to process images, this could give you trouble). I load the image in GRAYSCALE
mode. Now, I convert it to BGR
to have a color copy (so I can draw some results later) and convert it back to grayscale to continue processing. The binary image I get is this:
Nothing fancy, just an ordinary binary image. Note that the black color is relatively constant across all rows in the blank paragraphs. Alright, next, reduce the image to a column using SUM
mode:
# Reduce the ROI to a n row x 1 columns matrix:
reducedImg = cv2.reduce(binaryImage, 1, cv2.REDUCE_SUM, dtype=cv2.CV_32S)
This reduces the image. Watch out with the data types. This particular operation yields 32 bit signed integers
to store the sums of all rows. Now, let's try to get the minimum areas. There will be multiple, as the original has multiple pixels that make up the blank spaces. I'll get the column's maximum value and set a threshold
as a fraction of this value:
# Get the maximum element from the reduced image array:
maxElement = np.amax(reducedImg)
# Define a threshold and accumulate
# the coordinate of the points:
threshValue = 0.1 * maxElement
# Get the height (or lenght) of the array:
reducedHeight = reducedImg.shape[0]
# We will store the Y coordinate here:
Y = []
I've set the threshold to be 10%
of the maximum sum value. Also, I've prepared some variables before traversing the image. reducedHeight
is the length of the array and Y
is a list that will store all the coordinates that are below the threshold. Let's loop through the array:
# Search for Y coordinates lower
# than the threshold:
for i in range(reducedHeight):
# Get current value from column:
currentValue = reducedImg[i]
# Check out if the value is below the threshold:
if currentValue < threshValue:
# Store the value:
Y.append(i)
Nice. We have all the points we need stored in Y
. If we plot these points as lines, we can visualize the cluster of lines representing each paragraph. This is the image:
Now, as there are multiple lines, we need an average. In fact, there are two clusters in Y
, each cluster represents a paragraph, and we have two paragraphs in the image. There are multiple ways of doing this, but at the end of the day, we need two average values. Seems like a job for K-Means, then, because it is exactly what it does - it receives data, it clusters data and it returns the average centers of said clusters. Let's apply K-Means to our Y
array. But first, the array needs some data handling:
# Reshape the array for K-means
Y = np.array(Y)
Y = Y.reshape(-1,1)
# K-means operates on 32-bit float data:
floatPoints = np.float32(Y)
# Set the convergence criteria and call K-means:
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(floatPoints, 2, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Print the centers:
print(center)
Output:
[[24.000002]
[95. ]]
Yeah, those are our centers. One value for cluster 1
(paragraph 1) and another one for cluster 2
(paragraph 2). The cool thing about K-means is that there could be more paragraphs - and they will be clustered by K-means and it will always return the appropriate centers of said clusters. Alright, let's use our new info to check out our two lines:
# Draw the average lines:
for p in range(len(center)):
# Get line points:
x1 = 0
y1 = int(center[p][0])
x2 = int(inputCopy.shape[1])
y2 = y1
cv2.line(inputCopy, (x1, y1), (x2, y2), (0, 255, 0), 1)
cv2.imshow("Lines", inputCopy)
cv2.waitKey(0)
These are the centers (in green - the lines are there, but the image is too small to see them):
We can finally crop the image with this info:
# Crop image:
x = 0
y = int(center[0][0])
w = inputCopy.shape[1]
h = int(center[1][0])
imgCrop = inputImage[y:h,x:w]
cv2.imshow("imgCrop", imgCrop)
cv2.waitKey(0)
Which yields:
Upvotes: 4