Reputation: 25
I would like to conduct Logistic Regression in Python.
My reference in R is
model_1 <- glm(status_1 ~., data = X_train, family=binomial)
summary(model_1)
I'm trying to convert this into Python. But not so sure how to grab all variables.
import statsmodels.api as sm
model = sm.formula.glm("status_1 ~ ", family=sm.families.Binomial(), data=train).fit()
print(model.summary())
How can I use all variables, which means what do I need to input after status_1?
Upvotes: 1
Views: 1502
Reputation: 41
According to your question, I understand that you have binomial data and you want to create a Generalised Linear Model using logit as link function. Also, as you can see in this thread (jseabold's answer) the feature you mentioned doesn't exist in patsy
yet. So I will show you how to create a Generalised Linear Model when you have Binomial data by using sm.GLM()
function.
#Imports
import numpy as np
import pandas as pd
import statsmodels.api as sm
#Suppose that your train data is in a dataframe called data_train
#Let's split the data into dependent and independent variables
In this phase I want to mention that our dependent variable should be a 2d array with two columns as the help for the statsmodels GLM function suggests:
Binomial family models accept a 2d array with two columns. If supplied, each observation is expected to be [success, failure].
#Let's create the array which holds the dependent variable
y = data_train[["the name of the column of successes","the name of the column of failures"]]
#Let's create the array which holds the independent variables
X = data_train.drop(columns = ["the name of the column of successes","the name of the column of failures"])
#We have to add a constant in the array of the independent variables because by default constants
#aren't included in the model
X = sm.add_constant(X)
#It's time to create our model
logit_model = sm.GLM(
endog = y,
exog = X,
family = sm.families.Binomial(link=sm.families.links.Logit())).fit())
#Let's see some information about our model
logit_model.summary()
Upvotes: 0
Reputation: 1760
statsmodels
makes it pretty straightforward to do logistic regression, as such:
import statsmodels.api as sm
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]
log_reg = sm.Logit(ytrain, Xtrain).fit()
Where gmat
, gpa
and work_experience
are your independent variables.
Upvotes: 2