Edward Yu
Edward Yu

Reputation: 418

ValueError: endog must be in the unit interval

While using statsmodels, I am getting this weird error: ValueError: endog must be in the unit interval. Can someone give me more information on this error? Google is not helping.

Code that produced the error:

"""
Multiple regression with dummy variables. 
"""

import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np

df = pd.read_csv('cost_data.csv')
df.columns = ['Cost', 'R(t)', 'Day of Week']
dummy_ranks = pd.get_dummies(df['Day of Week'], prefix='days')
cols_to_keep = ['Cost', 'R(t)']
data = df[cols_to_keep].join(dummy_ranks.ix[:,'days_2':])
data['intercept'] = 1.0

print(data)

train_cols = data.columns[1:]
logit = sm.Logit(data['Cost'], data[train_cols])

result = logit.fit()

print(result.summary())

And the traceback:

Traceback (most recent call last):
  File "multiple_regression_dummy.py", line 20, in <module>
    logit = sm.Logit(data['Cost'], data[train_cols])
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/statsmodels/discrete/discrete_model.py", line 404, in __init__
    raise ValueError("endog must be in the unit interval.")
ValueError: endog must be in the unit interval.

Upvotes: 18

Views: 62401

Answers (3)

Ignacio Arizna
Ignacio Arizna

Reputation: 21

I had the same problem: I change the model from a Classification to a Regression one (I was using a Classification Model .logit in a Regression problem)

You can still use StatsModel, but with OLS, for example, instead of logit. Logit (Logistic Regression) is for Classification problems, but here it seems it is a Regression one. Using OLS, could solve the problem

Upvotes: 1

CodingCody
CodingCody

Reputation: 31

It seems like you followed the same logistic regression tutorial that I did: http://blog.yhat.com/posts/logistic-regression-and-python.html

I ended up getting the same Value Error when I fit my logistic regression, and the trick I needed to get it running was making sure to drop all rows of my data with missing values (N/A or np.nan).

This can be done with the pandas function pandas.notnull() as follows :

data = data[pd.notnull(data['Cost'])],

data = data[pd.notnull(data['R(t)'])],

...

and so on until all your variables have the same amount of values to work with.

Hope this helps someone else!

Upvotes: 3

user5323012
user5323012

Reputation: 281

I got this error when my target column had values larger than 1. Make sure your target column is between 0 and 1 (as is required for a Logistic Regression) and try again. For example, if you have target column with values 1-5, make 4 and 5 the positive class and 1,2,3 the negative class. Hope this helps.

Upvotes: 28

Related Questions