Trip Ives
Trip Ives

Reputation: 71

Math.Net Multiple Regression Is Wrong After The 4th Independent Variables

I am able to generate correct intercept and coefficients for a multiple regression (Math.Net) adding up to three independent variables. However, once a fourth independent variable is added the returned values are nowheres near close.

Using this code:

        Dim i As Integer
        Dim g(5)() As Double

        g(0) = {1.0, 4.0, 3.2}
        g(1) = {2.0, 5.0, 4.1}
        g(2) = {3.0, 2.0, 2.5}
        g(3) = {4.0, 3.0, 1.6}
        g(4) = {4.0, 3.0, 1.6}

        Dim d As Double() = {3.5, 5.6, 1.2, 15.2, 3.4, 4.2}

        Dim p As Double() = MultipleRegression.QR(Of Double)(g, d, intercept:=True)

        For i = 0 To UBound(p)
            Debug.WriteLine(p(i))
        Next

I get:

-2.45972222222223
1.13194444444445
3.11805555555555
-2.38888888888889

These are correct.

However, if I run the same code, but add a 4th independent variable as such:

        Dim i As Integer
        Dim g(5)() As Double

        g(0) = {1.0, 4.0, 3.2, 5.3}
        g(1) = {2.0, 5.0, 4.1, 2.4}
        g(2) = {3.0, 2.0, 2.5, 3.6}
        g(3) = {4.0, 3.0, 1.6, 2.1}
        g(4) = {4.0, 3.0, 1.6, 2.1}
        g(5) = {4.0, 3.0, 1.6, 2.1}

        Dim d As Double() = {3.5, 5.6, 1.2, 15.2, 3.4, 4.2}

        Dim p As Double() = MultipleRegression.QR(Of Double)(g, d, intercept:=True)

        For i = 0 To UBound(p)
            Debug.WriteLine(p(i))
        Next

I get:

6.88018203734109E+17
-9.8476516475107E+16
-3.19472310972754E+16
-4.61094057074081E+16
-5.92835216238101E+16

These number are nowhere close to being correct.

If anyone can provide any direction as to what I am doing wrong, I would be very appreciative. TIA

Upvotes: 0

Views: 196

Answers (1)

phv3773
phv3773

Reputation: 497

I have not worked out the math details, but looking intuitively at your problem, of the six observations, three (g(3),g(4),g(5)) have identical independent variables, and the corresponding values of the dependent variable have the highest, median, and lowest values. So these observations don't have any real predictive value. In effect, you are trying to estimate 5 values based on three observations. That's not going to work well, and results in instability in the math.

I've changed your data very slightly, and it returns better values. (I use C#). The problem is with the data, not the program.

        double[][] g = new double [6][];

        g[0] = new double[4] { 1.0, 4.0, 3.2, 5.3};
        g[1] = new double[4] { 2.0, 5.0, 4.1, 2.4};
        g[2] = new double[4] { 3.0, 2.0, 2.5, 3.6};
        g[3] = new double[4] { 4.0, 3.0, 1.6, 2.12};
        g[4] = new double[4] { 4.0, 3.0, 1.6, 2.11};
        g[5] = new double[4] { 4.0, 3.0, 1.6, 2.1};

        double[] d = new double[6] { 3.5, 5.6, 1.2, 15.2, 3.4, 4.2 };

        var p = MultipleRegression.QR(g, d, true);

        for (int i = 0; i < p.Length; i++) Console.WriteLine(p[i].ToString());

This returns:

-6386.81388888898
913.902777777791
297.597222222225
428.444444444452
550.000000000007

Upvotes: 1

Related Questions