Reputation: 345
I am trying to calculate the coefficients using multivariate linear regression. I am using the statsmodels
library to calculate the coefficients. The problem is that with this code I get the error ValueError: endog and exog matrices are different sizes
. I get this error because with this example the y
set has 4 elements, and the X
set has a list with 7 ndarrays inside where each list has 5 elements.
But what I don't understand is that, the x
set (not X
) is a list with 4 lists inside (y
has 4 elements), where each list is composed by 7 variables. For me, the x
and y
have the same number of elements.
How can I fix this error?
import numpy as np
import statsmodels.api as sm
def test_linear_regression():
x = [[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102259506.0, 44049537.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0], [0.0, 1102259506.0, 44049537.0, 10.0, 2.0, 32000.0, 49222464.0]]
y = [71.7554421425, 37.5205008984, 44.9945571423, 53.5441429615]
reg_m(y, x)
def reg_m(y, x):
ones = np.ones(len(x[0]))
X = sm.add_constant(np.column_stack((x[0], ones)))
y.append(1)
for ele in x[1:]:
X = sm.add_constant(np.column_stack((ele, X)))
results = sm.OLS(y, X).fit()
return results
if __name__ == "__main__":
test_linear_regression()
Upvotes: 0
Views: 1440
Reputation: 18883
Assuming each list in x
corresponds to each value of y
:
x = [[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0],
[0.0, 1102259506.0, 44049537.0, 9.0, 2.0, 32000.0, 49222464.0],
[0.0, 1102249463.0, 44055788.0, 9.0, 2.0, 32000.0, 49222464.0],
[0.0, 1102259506.0, 44049537.0, 10.0, 2.0, 32000.0, 49222464.0]
]
y = [71.7554421425, 37.5205008984, 44.9945571423, 53.5441429615]
def reg_m(x, y):
x = np.array(x)
y = np.array(y)
# adds a constant of ones for y intercept
X = np.insert(x, 0, np.ones((1,)), axis=1)
# or, if you REALLY want to use add_constant, to add ones, use this
# X = sm.add_constant(x, has_constant='add')
return sm.OLS(y, X).fit()
model = reg_m(x, y)
To see a summary printout of the model, just model.summary()
"""
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.450
Model: OLS Adj. R-squared: -0.649
Method: Least Squares F-statistic: 0.4096
Date: Thu, 07 Jul 2016 Prob (F-statistic): 0.741
Time: 21:50:12 Log-Likelihood: -14.665
No. Observations: 4 AIC: 35.33
Df Residuals: 1 BIC: 33.49
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const -1.306e-07 2.18e-07 -0.599 0.657 -2.9e-06 2.64e-06
x1 -3.086e-11 5.15e-11 -0.599 0.657 -6.86e-10 6.24e-10
x2 -0.0001 0.000 -0.900 0.534 -0.002 0.002
x3 0.0031 0.003 0.900 0.534 -0.041 0.047
x4 16.0236 26.761 0.599 0.657 -324.006 356.053
x5 8.321e-12 9.25e-12 0.900 0.534 -1.09e-10 1.26e-10
x6 1.331e-07 1.48e-07 0.900 0.534 -1.75e-06 2.01e-06
x7 0.0002 0.000 0.900 0.534 -0.003 0.003
==============================================================================
Omnibus: nan Durbin-Watson: 1.500
Prob(Omnibus): nan Jarque-Bera (JB): 0.167
Skew: -0.000 Prob(JB): 0.920
Kurtosis: 2.000 Cond. No. inf
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
[3] The smallest eigenvalue is 0. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
"""
Upvotes: 1