Reputation: 341
Assuming I have a group of lines segments like the red lines (or green lines) in this picture
I want to know how can I replace them with just one line segment that approximates them best. Or maybe you can advise what to search for since this might be a common problem in statistics.
Problem background: This comes actually from using OpenCV's probabilistic Hough Transform. I want to detect the corners of a piece of paper. When I apply it to an image I get on the edges a group of lines that I want to convert to a single line continuous segment.
One way to do it that I was thinking of, is to take from the line a couple of points and then to use line regression to get the equation of the line. Once I have that I should just cut it to a segment.
Upvotes: 6
Views: 2191
Reputation: 46630
Here's a potential solution:
Obtain binary image. Load image, convert to grayscale, and Otsu's threshold
Perform morphological operations. We morph close to combine the contours into a single contour
Find convex hull. We create a blank mask then find the convex hull on the binary image. We draw the lines onto the mask, morph close, then find contours and fill in the outline to get a solid image
Perform linear regression. We find the line of best fit on the binary image then draw this resulting line onto a new mask
Bitwise convex hull and line mask together. We bitwise-and both masks together and draw this resulting contour onto the original image.
Here's a visualization of each step:
Using this screenshotted input image
Binary image ->
Morph close
# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph close
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
Convex hull outline ->
convex hull filled
# Create line mask and convex hull mask
line_mask = np.zeros(image.shape, dtype=np.uint8)
convex_mask = np.zeros(image.shape, dtype=np.uint8)
# Find convex hull on the binary image
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnt = cnts[0]
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
for i in range(defects.shape[0]):
s,e,f,d = defects[i,0]
start = tuple(cnt[s][0])
end = tuple(cnt[e][0])
far = tuple(cnt[f][0])
cv2.line(convex_mask,start,end,[255,255,255],5)
# Morph close the convex hull mask, find contours, and fill in the outline
convex_mask = cv2.cvtColor(convex_mask, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
convex_mask = cv2.morphologyEx(convex_mask, cv2.MORPH_CLOSE, kernel, iterations=3)
cnts = cv2.findContours(convex_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.fillPoly(convex_mask, cnts, (255,255,255))
Linear regression
# Perform linear regression on the binary image
[vx,vy,x,y] = cv2.fitLine(cnt,cv2.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((image.shape[1]-x)*vy/vx)+y)
cv2.line(line_mask,(image.shape[1]-1,righty),(0,lefty),[255,255,255],2)
Bitwise-and filled convex hull and linear regression masks together
# Bitwise-and the line and convex hull masks together
result = cv2.bitwise_and(line_mask, line_mask, mask=convex_mask)
result = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
Result
# Find resulting contour and draw onto original image
cnts = cv2.findContours(result, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.drawContours(image, cnts, -1, (200,100,100), 3)
Here's the result with the other input image
Full code
import cv2
import numpy as np
# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph close
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
# Create line mask and convex hull mask
line_mask = np.zeros(image.shape, dtype=np.uint8)
convex_mask = np.zeros(image.shape, dtype=np.uint8)
# Find convex hull on the binary image
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnt = cnts[0]
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
for i in range(defects.shape[0]):
s,e,f,d = defects[i,0]
start = tuple(cnt[s][0])
end = tuple(cnt[e][0])
far = tuple(cnt[f][0])
cv2.line(convex_mask,start,end,[255,255,255],5)
# Morph close the convex hull mask, find contours, and fill in the outline
convex_mask = cv2.cvtColor(convex_mask, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
convex_mask = cv2.morphologyEx(convex_mask, cv2.MORPH_CLOSE, kernel, iterations=3)
cnts = cv2.findContours(convex_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.fillPoly(convex_mask, cnts, (255,255,255))
# Perform linear regression on the binary image
[vx,vy,x,y] = cv2.fitLine(cnt,cv2.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((image.shape[1]-x)*vy/vx)+y)
cv2.line(line_mask,(image.shape[1]-1,righty),(0,lefty),[255,255,255],2)
# Bitwise-and the line and convex hull masks together
result = cv2.bitwise_and(line_mask, line_mask, mask=convex_mask)
result = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
# Find resulting contour and draw onto original image
cnts = cv2.findContours(result, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.drawContours(image, cnts, -1, (200,100,100), 3)
cv2.imshow('thresh', thresh)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.imshow('line_mask', line_mask)
cv2.imshow('convex_mask', convex_mask)
cv2.imshow('result', result)
cv2.waitKey()
Upvotes: 3
Reputation: 17575
This is a great question. The right way to look at this is to integrate the error along each line segment. Instead of a simple term (y[i] - y_hat)^2
(where y_hat
is the predicted value from the regression line) you should have integral((y[i](t) - y_hat)^2, t, 0, 1)
where y[i](t)
is parametrization of the line segment, y[i](t) = t * y[i, 1] + (1 - t)*y[i, 0]
(denoting the endpoints of the i
-th segment as y[i, 0]
and y[i, 1]
). I think you'll find that you can compute the integral exactly and therefore you'll get terms for the sum of squared errors which involve just the endpoints. I've left out some details but I think that's enough for you to work out the rest. EDIT: Error terms should be squared; I've adjusted formulas accordingly.
2ND EDIT: I worked out some formulas (using Maxima). For a regression line represented by y = alpha*x + beta
, I get as the least squares estimates for alpha
and beta
:
alpha = (4*('sum(ll[k],k,1,n))*'sum(xx[k][2]*yy[k][2]*ll[k],k,1,n)
+2*('sum(ll[k],k,1,n))*'sum(xx[k][1]*yy[k][2]*ll[k],k,1,n)
+('sum(xx[k][2]*ll[k],k,1,n))*(-3*'sum(yy[k][2]*ll[k],k,1,n)
-3*'sum(yy[k][1]*ll[k],k,1,n))
+('sum(xx[k][1]*ll[k],k,1,n))*(-3*'sum(yy[k][2]*ll[k],k,1,n)
-3*'sum(yy[k][1]*ll[k],k,1,n))
+2*('sum(ll[k],k,1,n))*'sum(yy[k][1]*xx[k][2]*ll[k],k,1,n)
+4*('sum(ll[k],k,1,n))*'sum(xx[k][1]*yy[k][1]*ll[k],k,1,n))
/(4*('sum(ll[k],k,1,n))*'sum(xx[k][2]^2*ll[k],k,1,n)
+4*('sum(ll[k],k,1,n))*'sum(xx[k][1]*xx[k][2]*ll[k],k,1,n)
-3*('sum(xx[k][2]*ll[k],k,1,n))^2
-6*('sum(xx[k][1]*ll[k],k,1,n))*'sum(xx[k][2]*ll[k],k,1,n)
+4*('sum(ll[k],k,1,n))*'sum(xx[k][1]^2*ll[k],k,1,n)
-3*('sum(xx[k][1]*ll[k],k,1,n))^2)
beta = -((2*'sum(xx[k][2]*ll[k],k,1,n)+2*'sum(xx[k][1]*ll[k],k,1,n))
*'sum(xx[k][2]*yy[k][2]*ll[k],k,1,n)
+('sum(xx[k][2]*ll[k],k,1,n)+'sum(xx[k][1]*ll[k],k,1,n))
*'sum(xx[k][1]*yy[k][2]*ll[k],k,1,n)
+('sum(xx[k][2]^2*ll[k],k,1,n))*(-2*'sum(yy[k][2]*ll[k],k,1,n)
-2*'sum(yy[k][1]*ll[k],k,1,n))
+('sum(xx[k][1]*xx[k][2]*ll[k],k,1,n))
*(-2*'sum(yy[k][2]*ll[k],k,1,n)-2*'sum(yy[k][1]*ll[k],k,1,n))
+('sum(xx[k][1]^2*ll[k],k,1,n))*(-2*'sum(yy[k][2]*ll[k],k,1,n)
-2*'sum(yy[k][1]*ll[k],k,1,n))
+('sum(xx[k][2]*ll[k],k,1,n)+'sum(xx[k][1]*ll[k],k,1,n))
*'sum(yy[k][1]*xx[k][2]*ll[k],k,1,n)
+2*('sum(xx[k][1]*yy[k][1]*ll[k],k,1,n))*'sum(xx[k][2]*ll[k],k,1,n)
+2*('sum(xx[k][1]*ll[k],k,1,n))*'sum(xx[k][1]*yy[k][1]*ll[k],k,1,n))
/(4*('sum(ll[k],k,1,n))*'sum(xx[k][2]^2*ll[k],k,1,n)
+4*('sum(ll[k],k,1,n))*'sum(xx[k][1]*xx[k][2]*ll[k],k,1,n)
-3*('sum(xx[k][2]*ll[k],k,1,n))^2
-6*('sum(xx[k][1]*ll[k],k,1,n))*'sum(xx[k][2]*ll[k],k,1,n)
+4*('sum(ll[k],k,1,n))*'sum(xx[k][1]^2*ll[k],k,1,n)
-3*('sum(xx[k][1]*ll[k],k,1,n))^2)
where xx
and yy
are lists of pairs of values, one pair for each line segment. That is, xx[k]
are the x-coordinates for the endpoints of the k
-th segment, yy[k]
are the y-coordinates for the endpoints of the k
-th segment, and ll[k]
is the length sqrt((xx[k][2] - xx[k][1])^2 + (yy[k][2] - yy[k][1])^2)
of the k
-th segment.
Here is my program to derive those formulas. Probably there are other reasonable ways to set up this problem which yield similar but different formulas.
y_hat[k](l) := alpha * x[k](l) + beta;
x[k](l) := (1 - l/ll[k]) * xx[k][1] + l/ll[k] * xx[k][2];
y[k](l) := (1 - l/ll[k]) * yy[k][1] + l/ll[k] * yy[k][2];
e[k]:= y[k](l) - y_hat[k](l);
foo : sum (integrate (e[k]^2, l, 0, ll[k]), k, 1, n);
declare (nounify (sum), linear);
[da, db] : [diff (foo, alpha), diff (foo, beta)];
map (expand, %);
bar : solve (%, [alpha, beta]);
Here are some example data and the result I get. I postponed defining dx
, dy
, and ll
because, since they are all constant terms, I didn't want them to be expanded in the solutions for alpha
and beta
.
dx[k] := xx[k][2] - xx[k][1];
dy[k] := yy[k][2] - yy[k][1];
ll[k] := sqrt (dx[k]^2 + dy[k]^2);
xx : [[1,2],[1.5,3.5],[5.5,10.5]]$
yy : [[1,2.2],[1.5,3.3],[5,12]]$
''bar, nouns, n=length(xx);
=> [[alpha = 1.133149837130799, beta = - 0.4809409869515073]]
Upvotes: 1