Reputation: 21
from statsmodels.regression.linear_model import OLS
import numpy as np
X = np.array([[1,2,3],[4,7,5]])
y = np.array([1,2])
mod = OLS(X,y)
res = mod.fit()
print(res.summary())
Gives the following error:
ValueError Traceback (most recent call last)
<ipython-input-78-5e3dfbfe5426> in <module>
6 mod = OLS(X,y)
7 res = mod.fit()
----> 8 print(res.summary())
~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in summary(self, yname, xname, title, alpha)
2483
2484 rsquared_type = '' if self.k_constant else ' (uncentered)'
-> 2485 top_right = [('R-squared' + rsquared_type + ':', ["%#8.3f" % self.rsquared]),
2486 ('Adj. R-squared' + rsquared_type + ':', ["%#8.3f" % self.rsquared_adj]),
2487 ('F-statistic:', ["%#8.4g" % self.fvalue]),
~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\tools\decorators.py in __get__(self, obj, type)
91 _cachedval = _cache.get(name, None)
92 if _cachedval is None:
---> 93 _cachedval = self.fget(obj)
94 _cache[name] = _cachedval
95
~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in rsquared(self)
1636 return 1 - self.ssr/self.centered_tss
1637 else:
-> 1638 return 1 - self.ssr/self.uncentered_tss
1639
1640 @cache_readonly
~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\tools\decorators.py in __get__(self, obj, type)
91 _cachedval = _cache.get(name, None)
92 if _cachedval is None:
---> 93 _cachedval = self.fget(obj)
94 _cache[name] = _cachedval
95
~\AppData\Local\Continuum\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in ssr(self)
1582 """Sum of squared (whitened) residuals."""
1583 wresid = self.wresid
-> 1584 return np.dot(wresid, wresid)
1585
1586 @cache_readonly
ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
It seems to only work when X has dimensions n by 1, with n the number of observations.
The following snippet does run without troubles:
from statsmodels.regression.linear_model import OLS
import numpy as np
X = np.array([[1,2,3],[4,7,5]])
y = np.array([1,2])
mod = OLS(X,y)
res = mod.fit()
print(res.params)
Resulting in the expected parameters. Any reason why summary (and for example f_test) throw errors when X has dimensions n by k with k>1?
Upvotes: 2
Views: 743
Reputation: 6475
You just inverted OLS
parameters, try:
mod = OLS(y,X)
then everything should work as expected, namely:
from statsmodels.regression.linear_model import OLS
import numpy as np
X = np.array([[1,2,3],[4,7,5]])
y = np.array([1,2])
mod = OLS(y,X)
res = mod.fit()
print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: 0.000
Date: Sun, 15 Mar 2020 Prob (F-statistic): nan
Time: 13:16:38 Log-Likelihood: 68.110
No. Observations: 2 AIC: -132.2
Df Residuals: 0 BIC: -134.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.0234 inf 0 nan nan nan
x2 0.0760 inf 0 nan nan nan
x3 0.2749 inf 0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 1.000
Prob(Omnibus): nan Jarque-Bera (JB): 0.333
Skew: 0.000 Prob(JB): 0.846
Kurtosis: 1.000 Cond. No. 7.83
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
Upvotes: 1