Reputation: 273
I would like to perform a simple logistic regression (1 dependent, 1 independent variable) in python. All of the documentation I see about logistic regressions in python is for using it to develop a predictive model. I would like to use it more from the statistics side. How do I find the Odds ratio
, p-value
, and confidence interval
of a simple logistic regression on python?
X = df[predictor]
y = df[binary_outcome]
model = LogisticRegression()
model.fit(X,y)
print(#model_stats)
with an ideal output of Odds ratio
, p-value
, and confidence interval
Upvotes: 2
Views: 5116
Reputation: 46908
I assume you are using LogisticRegression()
from sklearn
. You don't get to estimate p-value confidence interval from that. You can use statsmodels, also note that statsmodels without formulas is a bit different from sklearn (see comments by @Josef), so you need to add a intercept using sm.add_constant()
:
import statsmodels.api as sm
y = np.random.choice([0,1],50)
x = np.random.normal(0,1,50)
model = sm.GLM(y, sm.add_constant(x), family=sm.families.Binomial())
results = model.fit()
results.summary()
Generalized Linear Model Regression Results
Dep. Variable: y No. Observations: 50
Model: GLM Df Residuals: 48
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -33.125
Date: Sat, 09 Jan 2021 Deviance: 66.250
Time: 16:21:51 Pearson chi2: 50.1
No. Iterations: 4
Covariance Type: nonrobust
coef std err z P>|z| [0.025 0.975]
const -0.0908 0.309 -0.294 0.769 -0.696 0.514
x1 0.5975 0.361 1.653 0.098 -0.111 1.306
The coefficient is in log odds, you can simply convert that to odds ratio. The [0.025 0.975] columns are the 95% confidence interval for the log odds. Check out help page for more info
Upvotes: 4