Reputation: 5690
I'm trying to work out the most efficient method to find the linear regression equation (y = mx + c) for a dataset, given a 2 by n array.
Basically I want to know what the value of Y is when X is, for example, 50.
My current method leaves a lot to be desired:
inputData is my 2 by n array, with X in the first column and Y in the second.
x = 50
for i = 1 : size(inputData,1) % for every line in the inputData array
if (inputData(i,1) < x + 5) | (inputData(i,1) > x - 5) % if we're within 5 of the specified X value
arrayOfCloseYValues(i) = inputData(i, 2); % add the other position to the array
end
end
y = mean(arrayOfCloseYValues) % take the mean to find Y
As you can see, my above method simply tries to find values of Y that are within 5 of the given X value and gets the mean. This is a terrible method, plus it takes absolutely ages to process.
What I really need is a robust method for calculating the linear regression for X and Y, so that I can find the value through the equation y = mx + c...
PS. In my above method I do actually pre-allocate memory and remove trailing zeros at the end, but I have removed this part for simplicity.
Upvotes: 4
Views: 6226
Reputation: 4685
Polyfit is fine, but I think you're problem is a bit simpler. You have a 2 x n array of data. Let's say column 1 is y and column 2 is x, then:
y = inputData(:,1);
x = inputData(:,2);
b = ones(size(inputData));
A = [x b];
c = A\y
Should give you a least squares regression for the slope and offset.
Here's another way to test it:
x = transpose(0:10);
y = 0.5*x + 1 + 0.1*randn(size(x)); % as a test, m = 0.5, b=1, and add some noise
A = [x ones(size(x))];
c = A\y;
yest = c(1)*x + c(2);
plot(x,yest,x,y)
legend('y_{est}','y')
Should get you:
Upvotes: 4