xeon
xeon

Reputation: 345

Calculate coefficients in a multivariate linear regression

I am trying to calculate the coefficients using multivariate linear regression. I am using the statsmodels library to calculate the coefficients. The problem is that with this code I get the error ValueError: endog and exog matrices are different sizes. I get this error because with this example the y set has 4 elements, and the X set has a list with 7 ndarrays inside where each list has 5 elements.

But what I don't understand is that, the x set (not X) is a list with 4 lists inside (y has 4 elements), where each list is composed by 7 variables. For me, the x and y have the same number of elements.

How can I fix this error?

import numpy as np
import statsmodels.api as sm

def test_linear_regression():
    x = [[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102259506.0, 44049537.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102259506.0, 44049537.0, 10.0, 2.0, 32000.0, 49222464.0]]

    y = [71.7554421425, 37.5205008984, 44.9945571423, 53.5441429615]
    reg_m(y, x)

def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    y.append(1)
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results


if __name__ == "__main__":
    test_linear_regression()

Upvotes: 0

Views: 1440

Answers (1)

Jarad
Jarad

Reputation: 18883

Assuming each list in x corresponds to each value of y:

x = [[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0],
     [0.0, 1102259506.0, 44049537.0, 9.0, 2.0, 32000.0, 49222464.0],
     [0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0],
     [0.0, 1102259506.0, 44049537.0, 10.0, 2.0, 32000.0, 49222464.0]
     ]

y = [71.7554421425, 37.5205008984, 44.9945571423, 53.5441429615]

def reg_m(x, y):
  x = np.array(x)
  y = np.array(y)

  # adds a constant of ones for y intercept
  X = np.insert(x, 0, np.ones((1,)), axis=1)

  # or, if you REALLY want to use add_constant, to add ones, use this
  # X = sm.add_constant(x, has_constant='add')

  return sm.OLS(y, X).fit()

model = reg_m(x, y)

To see a summary printout of the model, just model.summary()

"""
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.450
Model:                            OLS   Adj. R-squared:                 -0.649
Method:                 Least Squares   F-statistic:                    0.4096
Date:                Thu, 07 Jul 2016   Prob (F-statistic):              0.741
Time:                        21:50:12   Log-Likelihood:                -14.665
No. Observations:                   4   AIC:                             35.33
Df Residuals:                       1   BIC:                             33.49
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const      -1.306e-07   2.18e-07     -0.599      0.657      -2.9e-06  2.64e-06
x1         -3.086e-11   5.15e-11     -0.599      0.657     -6.86e-10  6.24e-10
x2            -0.0001      0.000     -0.900      0.534        -0.002     0.002
x3             0.0031      0.003      0.900      0.534        -0.041     0.047
x4            16.0236     26.761      0.599      0.657      -324.006   356.053
x5          8.321e-12   9.25e-12      0.900      0.534     -1.09e-10  1.26e-10
x6          1.331e-07   1.48e-07      0.900      0.534     -1.75e-06  2.01e-06
x7             0.0002      0.000      0.900      0.534        -0.003     0.003
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.500
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.167
Skew:                          -0.000   Prob(JB):                        0.920
Kurtosis:                       2.000   Cond. No.                          inf
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
[3] The smallest eigenvalue is      0. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
"""

Upvotes: 1

Related Questions