Reputation: 59
I was asked to write a program for Logistical Regression using the following steps.
I've tried changing the values but I am not sure what to do. Also, I am a beginner at Statistics using Python.
import statsmodels.api as sa
import statsmodels.formula.api as sfa
biopsy = sa.datasets.get_rdataset("biopsy","MASS")
biopsy_data = biopsy.data
biopsy_data.rename(columns={"class":"Class"})
biopsy_data.Class = biopsy_data.Class.map({"benign":0,"malignant":1})
log_mod1 = sfa.logit("V1~Class",biopsy_data)
log_res1 = log_mod1.fit()
print(log_res1.summary())
I expected a table of the values but the output is a
ValueError("endog must be in the unit interval.")
Upvotes: 2
Views: 7379
Reputation: 1
Change:
log_mod1 = sfa.logit("V1~Class",biopsy_data)
to:
log_mod1 = sfa.logit("Class~V1",biopsy_data)
This works.
Upvotes: 0
Reputation: 496
There are a few preprocessing steps you need to do, they tell you that you have to be in the unit interval so between 0 and 1.
what you can do is feature scaling by doing : X - Xmin/ Xmax - Xmin
Here the modifications it should work :
import statsmodels.api as sa
import statsmodels.formula.api as sfa
biopsy = sa.datasets.get_rdataset("biopsy","MASS")
biopsy_data = biopsy.data
biopsy_data.rename(columns={"class":"Class"},inplace=True)
biopsy_data.Class = biopsy_data.Class.map({"benign":0,"malignant":1})
biopsy_data["V1"] = np.divide(biopsy_data["V1"] - biopsy_data["V1"].min(), biopsy_data["V1"].max() - biopsy_data["V1"].min())
log_mod1 = sfa.logit("V1~Class",biopsy_data)
log_res1 = log_mod1.fit()
print(log_res1.summary())
Simply before calling at sfa.logit()
I've made the preprocessing of the independent variable you wanted to use (V1
here).
Upvotes: 2