Briandmg
Briandmg

Reputation: 57

Efficient algorithm to fit a linear line along the upper boundary of data only

I'm currently trying to fit a linear line through a spread of scattered data in MATLAB. Now this is easy enough using the polyfit function where I can easily obtain my y= mx + c equation. However, I need to now fit a line along the upper boundary of my data, i.e., the top few data points. I know this description is vague, so lets assume that my scattered data will be in a shape of a cone, with its apex on the y-axis, and it spreads outwards and upwards in the +x and +y direction. I need to fit a best fit line on the 'upper edge of the cone' if you will.

I've developed an algorithm but it's extremely slow. It involves first fitting a line of best fit through ALL data, deleting all data points below this line of best fit, and iterating through until only 5% of the initial data points are left. The final best fit line will then reside close to the top edge of the cone. For 250 data points, this takes about 5s and with me dealing with more than a million data points, this algorithm is simply too inefficient.

I guess my question is: is there an algorithm to more efficiently achieve what I need? Or is there a way to sharpen up my code to eliminate unnecessary complexity?

Here is my code in MATLAB:

(As an example)

a = [4, 5, 1, 8, 1.6, 3, 8, 9.2]; %To be used as x-axis points
b = [45, 53, 12, 76, 25, 67, 75, 98]; %To be used as y-axis points

while prod(size(a)) > (0.05*prod(size(a))) %Iterative line fitting occurs until there are less than 5% of the data points left

      lobf = polyfit(a,b,1); %Line of Best Fit for current data points

      alen = length(a);

      for aindex = alen:-1:1 %For loop to delete all points below line of best fit

            ValLoBF = lobf(1)*a(aindex) + lobf(2)

            if ValLoBF > b(aindex) %if LoBF is above current point...
                   a(aindex) = []; %delete x coordinate...
                   b(aindex) = []; %and delete its corresponding y coordinate
            end
      end

end

Upvotes: 2

Views: 914

Answers (2)

EJG89
EJG89

Reputation: 1189

Well first of all your example code seems to be running indefinitely ;)

Some optimizations for your code:

a = [4, 5, 1, 8, 1.6, 3, 8, 9.2]; %To be used as x-axis points
b = [45, 53, 12, 76, 25, 67, 75, 98]; %To be used as y-axis points

n_init_a = length(a);

while length(a) > 0.05*n_init_a %Iterative line fitting occurs until there are less     than 5% of the data points left

  lobf = polyfit(a,b,1); % Line of Best Fit for current data points

  % Delete data points below line using logical indexing
  % First create values of the polyfit points using element-wise vector multiplication
  temp = lobf(1)*a + lobf(2); % Containing all polyfit values
  % Using logical indexing to discard all points below
  a(b<temp)=[]; % First destroy a
  b(b<temp)=[]; % Then b, very important!

end

Also you should try profiling your code by typing in the command window

profile viewer

and check what takes most time calculating your results. I suspect it is polyfit but that can't be sped up much probably.

Upvotes: 1

Andrey Rubshtein
Andrey Rubshtein

Reputation: 20915

What you are looking for is not line fitting. You are trying to find the convex hull of the points. enter image description here

You should check out the function convhull. Once you find the hull, you can remove all of the points that aren't close to it, and fit each part independently to avoid the fact that the data is noisy.


Alternatively, you could render the points onto some pixel grid, and then do some kind of morphological operation, like imclose, and finish with Hough transform. Check out also this answer.

Upvotes: 0

Related Questions