thecuriouscat
thecuriouscat

Reputation: 59

Unable to fix ValueError("endog must be in the unit interval")

I was asked to write a program for Logistical Regression using the following steps.

  1. Load the R dataset biopsy from the MASS package.
  2. Capture the data as a pandas dataframe.
  3. Rename the column name class to Class.
  4. Transform the Class column values benign and malignant to '0' and '1' respectively.
  5. Build a logistic regression model with independent variable V1 and dependent variable Class.
  6. Fit the model with data, and display the pseudo R-squared value

I've tried changing the values but I am not sure what to do. Also, I am a beginner at Statistics using Python.

import statsmodels.api as sa
import statsmodels.formula.api as sfa
biopsy = sa.datasets.get_rdataset("biopsy","MASS")
biopsy_data = biopsy.data
biopsy_data.rename(columns={"class":"Class"})
biopsy_data.Class = biopsy_data.Class.map({"benign":0,"malignant":1})
log_mod1 = sfa.logit("V1~Class",biopsy_data)
log_res1 = log_mod1.fit()
print(log_res1.summary())

I expected a table of the values but the output is a

ValueError("endog must be in the unit interval.")

Upvotes: 2

Views: 7379

Answers (2)

Shahid
Shahid

Reputation: 1

Change:

log_mod1 = sfa.logit("V1~Class",biopsy_data)

to:

log_mod1 = sfa.logit("Class~V1",biopsy_data)

This works.

Upvotes: 0

Rickantonais
Rickantonais

Reputation: 496

There are a few preprocessing steps you need to do, they tell you that you have to be in the unit interval so between 0 and 1.

what you can do is feature scaling by doing : X - Xmin/ Xmax - Xmin

Here the modifications it should work :

import statsmodels.api as sa
import statsmodels.formula.api as sfa
biopsy = sa.datasets.get_rdataset("biopsy","MASS")
biopsy_data = biopsy.data
biopsy_data.rename(columns={"class":"Class"},inplace=True)
biopsy_data.Class = biopsy_data.Class.map({"benign":0,"malignant":1})
biopsy_data["V1"] = np.divide(biopsy_data["V1"] - biopsy_data["V1"].min(), biopsy_data["V1"].max() - biopsy_data["V1"].min())
log_mod1 = sfa.logit("V1~Class",biopsy_data)
log_res1 = log_mod1.fit()
print(log_res1.summary())

Simply before calling at sfa.logit() I've made the preprocessing of the independent variable you wanted to use (V1 here).

Upvotes: 2

Related Questions