Reputation: 81
Given two random variables/measurements (x, y), both measured with error (error-in-variables case),
is there a routine in MATLAB to calculate the estimators (a, b) of a regression line y(i)=a·x(i)+b using the method of orthogonal distance regression?
Here's my implementation of Maximum Likelihood estimator:
x= [1.0, 0.6, 1.2, 1.4, 0.2];
y=[0.5, 0.3, 0.7, 1.0, 0.2];
mx = mean(x);
my = mean(y);
p = (x(:) - mx) .^ 2;
q = (y(:) - mx) .^ 2;
w = p .* q;
sxx = sum(p);
syy = sum(q);
sxy = sum(w); w=p.*q; sxy=sum(w);
l = 1; %# orthogonal distance regression
a = (syy - l * syy + sqrt((syy - l * sxx) ^ 2 + 4 * l * sxy^2)) / (2 * sxy);
b = my - a * mx;
EDIT (addressed to EitanT):
Here's a comparison of my estimators and yours:
Upvotes: 1
Views: 6276
Reputation: 1
I ran the algorithms above: (1) Maximum Likelihood Estimator (2) The orthogonal distance regression method (3) MINITAB's orthogonal regression all for the same five (x,y) data points. I added the absolute value of the residuals of each regression and got the following:
red green black
0.7748 0.5137 0.4485
where the red line is method (1), green is MINITAB method and (3) black line is the svd orthogonal distance regression. The black line gave the lowest absolute value residual sum.
Here is the comparison on a plot:
Upvotes: 0
Reputation: 32930
MATLAB doesn't have a built-in function exactly like that, but you can easily find the estimators a and b with svd
approximation1,2:
data = [x(:), y(:)];
[U, S, V] = svd(data - repmat(mean(data), size(data, 1), 1), 0);
a = -V(1, end) / V(2, end);
b = mean(data * V(:, end)) / V(2, end);
Which is in fact the orthogonal distance regression method.
EDIT #1:
Here's a plot of the original data, alongside my estimator and yours.
Your estimator is highly inaccurate, which brings me to believe that that your implementation is flawed.
EDIT #2:
Here's an updated plot if the computation of a
is corrected to:
a=(syy-l*syy+sqrt((syy-l*sxx)^2+4*l*sxy^2)) / (2*sxy); %# Forgot parentheses!
Closer, but still not as accurate as mine.
EDIT #1:
You can further improve the accuracy of sxx
, syy
and sxy
, like so:
cov_mat = cov(x, y);
sxx = cov_mat(1, 1); %# Same as: sxx = var(x);
syy = cov_mat(2, 2); %# Same as: syy = var(y);
sxy = cov_mat(1, 2); %# Same as: sxy = cov_mat(2, 1);
1 Gene H. Golub and Charles F. Van Loan (1996) "Matrix Computations" (3rd ed.). The Johns Hopkins University Press. pp 596.
2 http://en.wikipedia.org/wiki/Total_least_squares
Upvotes: 3