Reputation: 3409

find the line which best fits to the data

I'm trying to find the line which best fits to the data. I use the following code below but now I want to have the data placed into an array sorted so it has the data which is closest to the line first how can I do this? Also is polyfit the correct function to use for this?

x=[1,2,2.5,4,5];
y=[1,-1,-.9,-2,1.5];
n=1;
p = polyfit(x,y,n)

f = polyval(p,x);
plot(x,y,'o',x,f,'-')

PS: I'm using Octave 4.0 which is similar to Matlab

Upvotes: 2

Answers (3)

Rick T

Reputation: 3409

Here's some test code that may help someone else dealing with linear regression and least squares

%https://youtu.be/m8FDX1nALSE matlab code

%https://youtu.be/1C3olrs1CUw good video to work out by hand if you want to test

function [a0 a1] = rtlinreg(x,y) 
  x=x(:);
  y=y(:);
  n=length(x);
  a1 = (n*sum(x.*y) - sum(x)*sum(y))/(n*sum(x.^2) - (sum(x))^2);  %a1 this is the slope of linear model
  a0 = mean(y) - a1*mean(x); %a0 is the y-intercept
end

x=[65,65,62,67,69,65,61,67]'
y=[105,125,110,120,140,135,95,130]'

[a0 a1] = rtlinreg(x,y);  %a1 is the slope of linear model, a0 is the y-intercept

x_model =min(x):.001:max(x);
y_model = a0 + a1.*x_model;  %y=-186.47 +4.70x   
plot(x,y,'x',x_model,y_model)

Upvotes: -1

David Kaftan

Reputation: 2174

Sembei Norimaki did a good job of explaining your primary question, so I will look at your secondary question = is polyfit the right function?

The best fit line is defined as the line that has a mean error of zero.

If it must be a "line" we could use polyfit, which will fit a polynomial. Of course, a "line" can be defined as first degree polynomial, but first degree polynomials have some properties that make it easy to deal with. The first order polynomial (or linear) equation you are looking for should come in this form:

y = mx + b

where y is your dependent variable and X is your independent variable. So the challenge is this: find the m and b such that the modeled y is as close to the actual y as possible. As it turns out, the error associated with a linear fit is convex, meaning it has one minimum value. In order to calculate this minimum value, it is simplest to combine the bias and the x vectors as follows:

Xcombined = [x.' ones(length(x),1)];

then utilized the normal equation, derived from the minimization of error

beta = inv(Xcombined.'*Xcombined)*(Xcombined.')*(y.')

great, now our line is defined as Y = Xcombined*beta. to draw a line, simply sample from some range of x and add the b term

Xplot = [[0:.1:5].' ones(length([0:.1:5].'),1)];
Yplot = Xplot*beta;
plot(Xplot, Yplot);

So why does polyfit work so poorly? well, I cant say for sure, but my hypothesis is that you need to transpose your x and y matrixies. I would guess that that would give you a much more reasonable line.

x = x.';
y = y.';

then try

p = polyfit(x,y,n)

I hope this helps. A wise man once told me (and as I learn every day), don't trust an algorithm you do not understand!

Upvotes: 1

user2261062

Reputation:

You can first compute the error between the real value y and the predicted value f

err = abs(y-f);

Then sort the error vector

[val, idx] = sort(err);

And use the sorted indexes to have your y values sorted

y2 = y(idx);

Now y2 has the same values as y but the ones closer to the fitting value first.

Do the same for x to compute x2 so you have a correspondence between x2 and y2

x2 = x(idx);

Upvotes: 2

find the line which best fits to the data

Answers (3)

Related Questions