Reputation: 418
While using statsmodels, I am getting this weird error: ValueError: endog must be in the unit interval.
Can someone give me more information on this error? Google is not helping.
Code that produced the error:
"""
Multiple regression with dummy variables.
"""
import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np
df = pd.read_csv('cost_data.csv')
df.columns = ['Cost', 'R(t)', 'Day of Week']
dummy_ranks = pd.get_dummies(df['Day of Week'], prefix='days')
cols_to_keep = ['Cost', 'R(t)']
data = df[cols_to_keep].join(dummy_ranks.ix[:,'days_2':])
data['intercept'] = 1.0
print(data)
train_cols = data.columns[1:]
logit = sm.Logit(data['Cost'], data[train_cols])
result = logit.fit()
print(result.summary())
And the traceback:
Traceback (most recent call last):
File "multiple_regression_dummy.py", line 20, in <module>
logit = sm.Logit(data['Cost'], data[train_cols])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/statsmodels/discrete/discrete_model.py", line 404, in __init__
raise ValueError("endog must be in the unit interval.")
ValueError: endog must be in the unit interval.
Upvotes: 18
Views: 62401
Reputation: 21
I had the same problem: I change the model from a Classification to a Regression one (I was using a Classification Model .logit in a Regression problem)
You can still use StatsModel, but with OLS, for example, instead of logit. Logit (Logistic Regression) is for Classification problems, but here it seems it is a Regression one. Using OLS, could solve the problem
Upvotes: 1
Reputation: 31
It seems like you followed the same logistic regression tutorial that I did: http://blog.yhat.com/posts/logistic-regression-and-python.html
I ended up getting the same Value Error when I fit my logistic regression, and the trick I needed to get it running was making sure to drop all rows of my data with missing values (N/A or np.nan).
This can be done with the pandas function pandas.notnull() as follows :
data = data[pd.notnull(data['Cost'])],
data = data[pd.notnull(data['R(t)'])],
...
and so on until all your variables have the same amount of values to work with.
Hope this helps someone else!
Upvotes: 3
Reputation: 281
I got this error when my target column had values larger than 1. Make sure your target column is between 0 and 1 (as is required for a Logistic Regression) and try again. For example, if you have target column with values 1-5, make 4 and 5 the positive class and 1,2,3 the negative class. Hope this helps.
Upvotes: 28