Reputation: 431
Can someone help me figure out how to draw a perfect straight line from given set of points passing through majority (may be 5-6 point out of 20) of these points? Please note that this is not a line fit problem, but perfect line should be drawn (horizontal, vertical or somewhat slanted) between majority of the given points.
Here is MATLAB code:
e=[161 162 193 195 155 40 106 102 125 155 189 192 186 188 185 186 147 148 180 183];
f =[138 92 92 115 258 124 218 114 125 232 431 252 539 463 643 571 582 726 726 676];
figure;scatter(f, e, 5, 'red');
axis ij;
and here is image:
Upvotes: 2
Views: 1065
Reputation: 1631
Since you want the line to go through the majority of points, it sounds quite like a line fitting problem even though you say it isn't. Have you looked at the Theil-Sen estimator (for example this one on fex), which is a linear regression ignoring up to some 30% of the outliers.
If you simply want a line through the extrema you might do something like this:
% Setup data
e = [161 162 193 195 155 40 106 102 125 155 189 192 186 188 185 186 147 148 180 183];
f = [138 92 92 115 258 124 218 114 125 232 431 252 539 463 643 571 582 726 726 676];
% Create scatterplot
figure(1);
scatter(f, e, 5, 'red');
axis ij;
% Fit extrema
[min_e, min_idx_e] = min(e);
[max_e, max_idx_e] = max(e);
[min_f, min_idx_f] = min(f);
[max_f, max_idx_f] = max(f);
% Determine largest range and draw line accordingly
if (max_e-min_e)>(max_f-min_f)
line(f([min_idx_e, max_idx_e]), e([min_idx_e, max_idx_e]), 'color', 'blue')
text(f(max_idx_e), e(max_idx_e), ' Extrema')
else
line(f([min_idx_f, max_idx_f]), e([min_idx_f, max_idx_f]), 'color', 'blue')
text(f(max_idx_f), e(max_idx_f), ' Extrema')
end
% Fit using Theil-Sen estimator
[m, e0] = Theil_Sen_Regress(f', e');
line([min_f, max_f], m*[min_f, max_f]+e0, 'color', 'black')
text(max_f, m*max_f+e0, ' Theil-Sen')
However, as you'll notice neither solution automatically fits the points automatically, simply because there are too many outliers, unless you filter those beforehand. Therefore you probably are better off using the RANSAC algorithm as proposed by Shai and McMa.
Upvotes: 4
Reputation: 1798
An easy but not very efficient solution would be to compute slope between each two points and if a set of points lie on a straight line, all pairs of these set would have the same slope. So one algorithm could pick all the poirs with same slope and connect them if they have one point in common. finally you have to choose the largest set. The time complexity of this algorithm will be O(N^2 log N) which N is number of points. As I see in your figure there is not a real perfect line going through all the points, rather there is a tolerance which in this algorithm could be defined as the criteria by which you connect two pairs. say if two slopes are different by less than 2 percent we connect the pairs.
Upvotes: 2
Reputation: 1566
That's a textbook example for the RANSAC algorithm. This free toolbox for Matlab actually has some very nice examples of line fitting.
Upvotes: 3