Reputation: 20101
I have a very similar problem to this question and it works for the training data. Now I´m trying to get the confidence interval for the predicted data:
from statsmodels.sandbox.regression.predstd import wls_prediction_std
#define y, X, X_forecast as pandas dataframes
regressor = sm.api.OLS(y, X).fit()
wls_prediction_std(regressor.predict(X_forecast))
But, of course, gives an error complaining about regressor.predict
being an array. How can I calculate the confidence interval for the predicted regression values?
Upvotes: 1
Views: 2205
Reputation: 579
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = sm.datasets.longley.load()
x = sm.add_constant(data.exog.iloc[:,2])
y= data.endog
mod = sm.OLS(y, x).fit()
print(mod.summary(alpha=0.01))
print(mod.conf_int(alpha=0.01, cols=None))
##e.g.
### use 99 % CI
##print(results.summary(alpha=0.01))
pred_ols = mod.get_prediction()
# mean confidence interval
iv_l = pred_ols.summary_frame()["mean_ci_lower"]
iv_u = pred_ols.summary_frame()["mean_ci_upper"]
# prediction interval
##iv_l = pred_ols.summary_frame()["obs_ci_lower"]
##iv_u = pred_ols.summary_frame()["obs_ci_upper"]
fig, ax = plt.subplots(figsize=(8, 6))
x1= x.iloc[:,1]
ax.plot(x1, y, "bo", label="data", )
ax.plot(x1, mod.fittedvalues, "r--.", label="OLS")
##ax.plot(x1, pred_ols.summary_frame()["mean"], "b--.", label="OLS")
ax.plot(x1, iv_u, "ro")
ax.plot(x1, iv_l, "ro")
ax.legend(loc="best")
plt.show()
pred_ols = mod.get_prediction()
# prediction interval
iv_l = pred_ols.summary_frame()["obs_ci_lower"]
iv_u = pred_ols.summary_frame()["obs_ci_upper"]
fig, ax = plt.subplots(figsize=(8, 6))
x1= x.iloc[:,1]
ax.plot(x1, y, "bo", label="data", )
ax.plot(x1, mod.fittedvalues, "r--.", label="OLS")
ax.plot(x1, iv_u, "ro")
ax.plot(x1, iv_l, "ro")
ax.legend(loc="best")
plt.show()
p.s. confidence-and-prediction-intervals
Upvotes: 1
Reputation: 11
you may have put the wrong parameter.
Let's try this one :
wls_prediction_std(regressor, exog=X_forecast, weights=None, alpha=0.05)
Upvotes: 1