user1487735
user1487735

Reputation: 81

Orthogonal distance regression in MATLAB

Given two random variables/measurements (x, y), both measured with error (error-in-variables case),
is there a routine in MATLAB to calculate the estimators (a, b) of a regression line y(i)=a·x(i)+b using the method of orthogonal distance regression?

Here's my implementation of Maximum Likelihood estimator:

x= [1.0, 0.6, 1.2, 1.4, 0.2];
y=[0.5, 0.3, 0.7, 1.0, 0.2];

mx = mean(x);
my = mean(y);
p = (x(:) - mx) .^ 2;
q = (y(:) - mx) .^ 2;
w = p .* q;
sxx = sum(p);
syy = sum(q);
sxy = sum(w);    w=p.*q;   sxy=sum(w);

l = 1;  %# orthogonal distance regression
a = (syy - l * syy + sqrt((syy - l * sxx) ^ 2 + 4 * l * sxy^2)) / (2 * sxy);
b = my - a * mx;

EDIT (addressed to EitanT):

Here's a comparison of my estimators and yours:

comparison

Upvotes: 1

Views: 6276

Answers (2)

David Herrera
David Herrera

Reputation: 1

I ran the algorithms above: (1) Maximum Likelihood Estimator (2) The orthogonal distance regression method (3) MINITAB's orthogonal regression all for the same five (x,y) data points. I added the absolute value of the residuals of each regression and got the following: red green black 0.7748 0.5137 0.4485
where the red line is method (1), green is MINITAB method and (3) black line is the svd orthogonal distance regression. The black line gave the lowest absolute value residual sum. Here is the comparison on a plot:

enter image description here

Upvotes: 0

Eitan T
Eitan T

Reputation: 32930

MATLAB doesn't have a built-in function exactly like that, but you can easily find the estimators a and b with svd approximation1,2:

data = [x(:), y(:)];
[U, S, V] = svd(data - repmat(mean(data), size(data, 1), 1), 0);
a = -V(1, end) / V(2, end);
b = mean(data * V(:, end)) / V(2, end);

Which is in fact the orthogonal distance regression method.

EDIT #1:
Here's a plot of the original data, alongside my estimator and yours.

comparison

Your estimator is highly inaccurate, which brings me to believe that that your implementation is flawed.

EDIT #2:

Here's an updated plot if the computation of a is corrected to:

a=(syy-l*syy+sqrt((syy-l*sxx)^2+4*l*sxy^2)) / (2*sxy);  %# Forgot parentheses!

enter image description here

Closer, but still not as accurate as mine.

EDIT #1:

You can further improve the accuracy of sxx, syy and sxy, like so:

cov_mat = cov(x, y);
sxx = cov_mat(1, 1);  %# Same as: sxx = var(x);
syy = cov_mat(2, 2);  %# Same as: syy = var(y);
sxy = cov_mat(1, 2);  %# Same as: sxy = cov_mat(2, 1);

1 Gene H. Golub and Charles F. Van Loan (1996) "Matrix Computations" (3rd ed.). The Johns Hopkins University Press. pp 596.
2 http://en.wikipedia.org/wiki/Total_least_squares

Upvotes: 3

Related Questions