Reputation: 9835
I'm trying to fit a line segment to a set of points but I have trouble finding an algorithm for it. I have a 2D line segment L
and a set of 2D points C
. L
can be represented in any suitable way (I don't care), like support and definition vector, two points, a linear equation with left and right bound, ... The only important thing is that the line has a beginning and an end, so it's not infinite.
I want to fit L
in C
, so that the sum of all distances of c
to L
(where c
is a point in C
) is minimized. This is a least squares problem but I (think) cannot use polynmoial fitting, because L
is only a segment. My mathematical knowledge in that area is a bit lacking so any hints on further reading would be appreciated aswell.
Here is an illustration of my problem:
The orange line should be fitted to the blue points so that the sum of squares of distances of each point to the line is minimal. I don't mind if the solution is in a different language or not code at all, as long as I can extract an algorithm from it.
Since this is more of a mathematical question I'm not sure if it's ok for SO or should be moved to cross validated or math exchange.
Upvotes: 3
Views: 1517
Reputation: 521
This solution is relatively similar to one already posted here, but I think is slightly more efficient, elegant and understandable, which is why I posted it despite the similarity.
As was already written, the min(max(...)) formulation makes it hard to solve this problem analytically, which is why scipy.optimize fits well.
The solution is based on the mathematical formulation for distance between a point and a finite line segment outlined in https://math.stackexchange.com/questions/330269/the-distance-from-a-point-to-a-line-segment
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize, NonlinearConstraint
def calc_distance_from_point_set(v_):
#v_ is accepted as 1d array to make easier with scipy.optimize
#Reshape into two points
v = (v_[:2].reshape(2, 1), v_[2:].reshape(2, 1))
#Calculate t* for s(t*) = v_0 + t*(v_1-v_0), for the line segment w.r.t each point
t_star_matrix = np.minimum(np.maximum(np.matmul(P-v[0].T, v[1]-v[0]) / np.linalg.norm(v[1]-v[0])**2, 0), 1)
#Calculate s(t*)
s_t_star_matrix = v[0]+((t_star_matrix.ravel())*(v[1]-v[0]))
#Take distance between all points and respective point on segment
distance_from_every_point = np.linalg.norm(P.T -s_t_star_matrix, axis=0)
return np.sum(distance_from_every_point)
if __name__ == '__main__':
#Random points from bounding box
box_1 = np.random.uniform(-5, 5, 20)
box_2 = np.random.uniform(-5, 5, 20)
P = np.stack([box_1, box_2], axis=1)
segment_length = 3
segment_length_constraint = NonlinearConstraint(fun=lambda x: np.linalg.norm(np.array([x[0], x[1]]) - np.array([x[2] ,x[3]])), lb=[segment_length], ub=[segment_length])
point = minimize(calc_distance_from_point_set, (0.0,-.0,1.0,1.0), options={'maxiter': 100, 'disp': True},constraints=segment_length_constraint).x
plt.scatter(box_1, box_2)
plt.plot([point[0], point[2]], [point[1], point[3]])
Example result:
Upvotes: 1
Reputation: 1116
Here is a proposition in python. The distance between the points and the line is computed based on the approach proposed here: Fit a line segment to a set of points
The fact that the segment has a finite length, which impose the usage of min
and max
function, or if
tests to see whether we have to use perpendicular distance or distance to one of the end points, makes really difficult (impossible?) to get an analytic solution.
The proposed solution will thus use optimization algorithm to approach the best solution. It uses scipy.optimize.minimize, see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
Since the segment length is fixed, we have only three degrees of freedom. In the proposed solution I use x and y coordinate of the starting segment point and segment slope as free parameters. I use getCoordinates
function to get starting and ending point of the segment from these 3 parameters and the length.
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import math as m
from scipy.spatial import distance
# Plot the points and the segment
def plotFunction(points,x1,x2):
'Plotting function for plane and iterations'
plt.plot(points[:,0],points[:,1],'ro')
plt.plot([x1[0],x2[0]],[x1[1],x2[1]])
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.show()
# Get the sum of the distance between all the points and the segment
# The segment is defined by guess and length were:
# guess[0]=x coordinate of the starting point
# guess[1]=y coordinate of the starting point
# guess[2]=slope
# Since distance is always >0 no need to use root mean square values
def getDist(guess,points,length):
start_pt=np.array([guess[0],guess[1]])
slope=guess[2]
[x1,x2]=getCoordinates(start_pt,slope,length)
total_dist=0
# Loop over each points to get the distance between the point and the segment
for pt in points:
total_dist+=minimum_distance(x1,x2,pt,length)
return(total_dist)
# Return minimum distance between line segment x1-x2 and point pt
# Adapted from https://stackoverflow.com/questions/849211/shortest-distance-between-a-point-and-a-line-segment
def minimum_distance(x1, x2, pt,length):
length2 = length**2 # i.e. |x1-x2|^2 - avoid a sqrt, we use length that we already know to avoid re-computation
if length2 == 0.0:
return distance.euclidean(p, v);
# Consider the line extending the segment, parameterized as x1 + t (x2 - x1).
# We find projection of point p onto the line.
# It falls where t = [(pt-x1) . (x2-x1)] / |x2-x1|^2
# We clamp t from [0,1] to handle points outside the segment vw.
t = max(0, min(1, np.dot(pt - x1, x2 - x1) / length2));
projection = x1 + t * (x2 - x1); # Projection falls on the segment
return distance.euclidean(pt, projection);
# Get coordinates of start and end point of the segment from start_pt,
# slope and length, obtained by solving slope=dy/dx, dx^2+dy^2=length
def getCoordinates(start_pt,slope,length):
x1=start_pt
dx=length/m.sqrt(slope**2+1)
dy=slope*dx
x2=start_pt+np.array([dx,dy])
return [x1,x2]
if __name__ == '__main__':
# Generate random points
num_points=20
points=np.random.rand(num_points,2)
# Starting position
length=0.5
start_pt=np.array([0.25,0.5])
slope=0
#Use scipy.optimize, minimize to find the best start_pt and slope combination
res = minimize(getDist, x0=[start_pt[0],start_pt[1],slope], args=(points,length), method="Nelder-Mead")
# Retreive best parameters
start_pt=np.array([res.x[0],res.x[1]])
slope=res.x[2]
[x1,x2]=getCoordinates(start_pt,slope,length)
print("\n** The best segment found is defined by:")
print("\t** start_pt:\t",x1)
print("\t** end_pt:\t",x2)
print("\t** slope:\t",slope)
print("** The total distance is:",getDist([x1[0],x2[1],slope],points,length),"\n")
# Plot results
plotFunction(points,x1,x2)
Upvotes: 0