Reputation: 3621
I have an over-determined system of 2D data. I am using the Eigen library to compute the linear regression line. The data is in the form of A x = b
, where A
is an nx1 matrix and b
is an n-sized vector.
When I run the SVD, I calculate a slope, and the line passes through the origin (i.e., there is no Y-intercept). For data which has a trend line that does not pass through the origin, this doesn't result in the line I'm looking for.
Here is an example:
//Eigen
#include <Eigen/Dense>
//std
#include <cstdlib>
#include <iostream>
int main() {
Eigen::MatrixXd A(15,1);
Eigen::VectorXd b(15);
A << -4, -4, -4, -3, -3, -2, -2, -1, -1, -1, 0, 1, 1, 2, 2;
b << 11.8, 10.9, 11.5, 9.6, 8.4, 7.4, 6.2, 4.8, 5.4, 4.5, 3.5, 1.5, 0.1, -0.5, -2;
//Calculate the SVD and solve for the regression line
Eigen::MatrixXd x = A.bdcSvd(Eigen::ComputeThinU | Eigen::ComputeThinV).solve(b);
Eigen::MatrixXd b_reg = A*x;
std::cout << x << std::endl;
std::cout << b_reg << std::endl;
return EXIT_SUCCESS;
}
The trend line I expect is y = -2.079x + 2.907. My program above reports x
to be -2.714, and the line passing through the origin.
Is there an easy way to use the SVD to recover the "offset" regression line? Or should I be using something other than the SVD? in which case, what?
Upvotes: 2
Views: 2218
Reputation: 91
I had this exact same question today and got it working with the Eigen library.
Eigen assumes that all data relevant to your problem are encoded in the matrices you pass. In your case, you're telling eigen that all functions are of the form:
y = m*x
because your A matrix is 1-dimensional. What you want, of course, is:
y = m*x + b
How can you make Eigen consider that? Let's re-write this in another form because you already used b in your code from traditional linear algebra formatting:
b = p1*A + p2
where p1 and p2 are the parameters you want to solve for. If you have a list of A and b values (which you do, and these are the x and y values from the first 2 equations), you could write it in vector notation as:
Ap=b
where p = {p1, p2} and b = {b1, b2, b3... } (as many entries as you have data)
Now think about this: If p is a 2x1 matrix, and b is a Nx1 matrix (N = the number of data points you have), what dimension does A need to be? It must be Nx2.
"Well what goes in the other column of A?" you ask? Look at a single case:
A11 * p1 + A12 * p2 = b1
Ah but wait, p2 is constant, and is the y-intercept you were trying to solve for. Therefore:
A12 * p2 = p2
A22 * p2 = p2
A32 * p2 = p2
...
How can this be? If all values in the second column of A are 1's.
So... in your code, make A a 2-column matrix, and fill the second column with 1's. The result will give you the two parameters you're looking for. You can output the values via the example in the documentation here: JacobiSVD
Upvotes: 2