Georg Leber
Georg Leber

Reputation: 3580

OLS Multiple Linear Regression with commons-math

Currently I have a dependency to commons-math 2.1 but I want to upgrade it to commons-math 3.6. Unfortunately there are some testcases that are not working any longer. I know what is causing my problem, but I don't know how to change the testcase accordingly to test the correct behavior as before.

I have following test code:

@Test
public void testIdentityMatrix() {
    double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, {  0, 0, 0, 1 } };
    double[] y = { 1, 2, 3, 4 };

    OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
    regression.setNoIntercept(true);
    regression.newSampleData(y, x);

    double[] b = regression.estimateRegressionParameters();
    for (int i = 0; i < y.length; i++)
    {
        assertEquals(b[i], y[i], 0.001);
    } 
}

After the upgrade to commons-math 3.6 the OLSMultipleLinearRegression checks the given matrix x and vector y for valid contents. And this validation fails with the message:

not enough data (4 rows) for this many predictors (4 predictors)

What do I need to change to correct that test case?

Upvotes: 1

Views: 1818

Answers (3)

nickzxd
nickzxd

Reputation: 61

I guess the 3rd row of x should be 0010 instead of 0001?

However, if you change x to

double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 1, 0 ), { 0, 0, 0, 1 }, {1,1,1,1} };

and change y to

double[] y = { 1, 2, 3, 4, 10 };

that the last element is the sum of other elements, then it works.

Upvotes: 0

Phil Steitz
Phil Steitz

Reputation: 644

This is a bug in Commons Math 3.x. When there is no intercept in the model, as long as the design matrix is not singular, the number of observations equal to the number of regressors should be OK. In your example, I think you mean for the third x row to be {0,0,1,0} (otherwise the design matrix is singular). With this change to your data and the code patch applied in the Hipparchus fix your test succeeds. This bug is being tracked as MATH-1392 in Commons Math.

Upvotes: 2

Semmel
Semmel

Reputation: 575

The number of samples has to be bigger than the number of variables. Apparently your test case it not correct. You would have to add at least one more sample. If you change

double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, {  0, 0, 0, 1 } };

to

double[][] x = { { 1, 0, 0, 0 }, { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, {  0, 0, 0, 1 }, {1,0,0,0} };

it should work. (although I didn't test it).

Upvotes: 0

Related Questions