Reputation: 57
I'm currently trying to fit a linear line through a spread of scattered data in MATLAB. Now this is easy enough using the polyfit function where I can easily obtain my y= mx + c equation. However, I need to now fit a line along the upper boundary of my data, i.e., the top few data points. I know this description is vague, so lets assume that my scattered data will be in a shape of a cone, with its apex on the y-axis, and it spreads outwards and upwards in the +x and +y direction. I need to fit a best fit line on the 'upper edge of the cone' if you will.
I've developed an algorithm but it's extremely slow. It involves first fitting a line of best fit through ALL data, deleting all data points below this line of best fit, and iterating through until only 5% of the initial data points are left. The final best fit line will then reside close to the top edge of the cone. For 250 data points, this takes about 5s and with me dealing with more than a million data points, this algorithm is simply too inefficient.
I guess my question is: is there an algorithm to more efficiently achieve what I need? Or is there a way to sharpen up my code to eliminate unnecessary complexity?
Here is my code in MATLAB:
(As an example)
a = [4, 5, 1, 8, 1.6, 3, 8, 9.2]; %To be used as x-axis points
b = [45, 53, 12, 76, 25, 67, 75, 98]; %To be used as y-axis points
while prod(size(a)) > (0.05*prod(size(a))) %Iterative line fitting occurs until there are less than 5% of the data points left
lobf = polyfit(a,b,1); %Line of Best Fit for current data points
alen = length(a);
for aindex = alen:-1:1 %For loop to delete all points below line of best fit
ValLoBF = lobf(1)*a(aindex) + lobf(2)
if ValLoBF > b(aindex) %if LoBF is above current point...
a(aindex) = []; %delete x coordinate...
b(aindex) = []; %and delete its corresponding y coordinate
end
end
end
Upvotes: 2
Views: 914
Reputation: 1189
Well first of all your example code seems to be running indefinitely ;)
Some optimizations for your code:
a = [4, 5, 1, 8, 1.6, 3, 8, 9.2]; %To be used as x-axis points
b = [45, 53, 12, 76, 25, 67, 75, 98]; %To be used as y-axis points
n_init_a = length(a);
while length(a) > 0.05*n_init_a %Iterative line fitting occurs until there are less than 5% of the data points left
lobf = polyfit(a,b,1); % Line of Best Fit for current data points
% Delete data points below line using logical indexing
% First create values of the polyfit points using element-wise vector multiplication
temp = lobf(1)*a + lobf(2); % Containing all polyfit values
% Using logical indexing to discard all points below
a(b<temp)=[]; % First destroy a
b(b<temp)=[]; % Then b, very important!
end
Also you should try profiling your code by typing in the command window
profile viewer
and check what takes most time calculating your results. I suspect it is polyfit but that can't be sped up much probably.
Upvotes: 1
Reputation: 20915
What you are looking for is not line fitting. You are trying to find the convex hull of the points.
You should check out the function convhull
. Once you find the hull, you can remove all of the points that aren't close to it, and fit each part independently to avoid the fact that the data is noisy.
Alternatively, you could render the points onto some pixel grid, and then do some kind of morphological operation, like imclose
, and finish with Hough transform. Check out also this answer.
Upvotes: 0