Reputation: 5117
I am running the following source code:
import statsmodels.formula.api as sm
# Add one column of ones for the intercept term
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)
regressor_OLS = sm.OLS(endog=y, exog=X).fit()
print(regressor_OLS.summary())
where
X
is an 50x5 (before adding the intercept term) numpy array which looks like this:
[[0 1 165349.20 136897.80 471784.10]
[0 0 162597.70 151377.59 443898.53]...]
and y
is a a 50x1 numpy array with float values for the dependent variable.
The first two columns are for a dummy variable with three different values. The rest of the columns are three different indepedent variables.
Although, it is said that the statsmodels.formula.api.OLS
adds automatically an intercept term (see @stellacia's answer here: OLS using statsmodel.formula.api versus statsmodel.api) its summary
does not show the statistical values of the intercept term as it evident below in my case:
OLS Regression Results
==============================================================================
Dep. Variable: Profit R-squared: 0.988
Model: OLS Adj. R-squared: 0.986
Method: Least Squares F-statistic: 727.1
Date: Sun, 01 Jul 2018 Prob (F-statistic): 7.87e-42
Time: 21:40:23 Log-Likelihood: -545.15
No. Observations: 50 AIC: 1100.
Df Residuals: 45 BIC: 1110.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 3464.4536 4905.406 0.706 0.484 -6415.541 1.33e+04
x2 5067.8937 4668.238 1.086 0.283 -4334.419 1.45e+04
x3 0.7182 0.066 10.916 0.000 0.586 0.851
x4 0.3113 0.035 8.885 0.000 0.241 0.382
x5 0.0786 0.023 3.429 0.001 0.032 0.125
==============================================================================
Omnibus: 1.355 Durbin-Watson: 1.288
Prob(Omnibus): 0.508 Jarque-Bera (JB): 1.241
Skew: -0.237 Prob(JB): 0.538
Kurtosis: 2.391 Cond. No. 8.28e+05
==============================================================================
For this reason, I added to my source code the line:
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)
as you can see at the beginning of my post and the statistical values of the intercept/constant are shown as below:
OLS Regression Results
==============================================================================
Dep. Variable: Profit R-squared: 0.951
Model: OLS Adj. R-squared: 0.945
Method: Least Squares F-statistic: 169.9
Date: Sun, 01 Jul 2018 Prob (F-statistic): 1.34e-27
Time: 20:25:21 Log-Likelihood: -525.38
No. Observations: 50 AIC: 1063.
Df Residuals: 44 BIC: 1074.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 5.013e+04 6884.820 7.281 0.000 3.62e+04 6.4e+04
x1 198.7888 3371.007 0.059 0.953 -6595.030 6992.607
x2 -41.8870 3256.039 -0.013 0.990 -6604.003 6520.229
x3 0.8060 0.046 17.369 0.000 0.712 0.900
x4 -0.0270 0.052 -0.517 0.608 -0.132 0.078
x5 0.0270 0.017 1.574 0.123 -0.008 0.062
==============================================================================
Omnibus: 14.782 Durbin-Watson: 1.283
Prob(Omnibus): 0.001 Jarque-Bera (JB): 21.266
Skew: -0.948 Prob(JB): 2.41e-05
Kurtosis: 5.572 Cond. No. 1.45e+06
==============================================================================
Why the statistical values of the intercept are not showing when I do not add my myself an intercept term even though it is said that statsmodels.formula.api.OLS
is adding this automatically?
Upvotes: 2
Views: 4601
Reputation: 589
"No constant is added by the model unless you are using formulas." Therefore try something like below example. Variable names should be defined according to your data set.
Use,
regressor_OLS = smf.ols(formula='Y_variable ~ X_variable', data=df).fit()
instead of,
regressor_OLS = sm.OLS(endog=y, exog=X).fit()
Upvotes: 5