Unable to fix ValueError("endog must be in the unit interval")

Question

I was asked to write a program for Logistical Regression using the following steps.

Load the R dataset biopsy from the MASS package.
Capture the data as a pandas dataframe.
Rename the column name class to Class.
Transform the Class column values benign and malignant to '0' and '1' respectively.
Build a logistic regression model with independent variable V1 and dependent variable Class.
Fit the model with data, and display the pseudo R-squared value

I've tried changing the values but I am not sure what to do. Also, I am a beginner at Statistics using Python.

import statsmodels.api as sa
import statsmodels.formula.api as sfa
biopsy = sa.datasets.get_rdataset("biopsy","MASS")
biopsy_data = biopsy.data
biopsy_data.rename(columns={"class":"Class"})
biopsy_data.Class = biopsy_data.Class.map({"benign":0,"malignant":1})
log_mod1 = sfa.logit("V1~Class",biopsy_data)
log_res1 = log_mod1.fit()
print(log_res1.summary())

I expected a table of the values but the output is a

ValueError("endog must be in the unit interval.")

Rickantonais · Accepted Answer

There are a few preprocessing steps you need to do, they tell you that you have to be in the unit interval so between 0 and 1.

what you can do is feature scaling by doing : X - Xmin/ Xmax - Xmin

Here the modifications it should work :

import statsmodels.api as sa
import statsmodels.formula.api as sfa
biopsy = sa.datasets.get_rdataset("biopsy","MASS")
biopsy_data = biopsy.data
biopsy_data.rename(columns={"class":"Class"},inplace=True)
biopsy_data.Class = biopsy_data.Class.map({"benign":0,"malignant":1})
biopsy_data["V1"] = np.divide(biopsy_data["V1"] - biopsy_data["V1"].min(), biopsy_data["V1"].max() - biopsy_data["V1"].min())
log_mod1 = sfa.logit("V1~Class",biopsy_data)
log_res1 = log_mod1.fit()
print(log_res1.summary())

Simply before calling at sfa.logit() I've made the preprocessing of the independent variable you wanted to use (V1 here).

Unable to fix ValueError("endog must be in the unit interval")

Answers (2)

Related Questions

Unable to fix ValueError(&quot;endog must be in the unit interval&quot;)

Answers (2)

Related Questions

Unable to fix ValueError("endog must be in the unit interval")