Reputation: 121
I have a question related to Logistic Regression where I am getting ValueError
Here's my dataset:
sub1 sub2 sub3 sub4
pol_1 0.000000 0.000000 0.0 0.000000
pol_2 0.000000 0.000000 0.0 0.000000
pol_3 0.050000 0.000000 0.0 0.000000
pol_4 0.000000 0.000000 0.0 0.000000
pol_5 0.000000 0.000000 0.0 0.000000
pol_6 0.000000 0.000000 0.0 0.000000
pol_7 0.000000 0.000000 0.0 0.000000
pol_8 0.000000 0.000000 0.0 0.000000
pol_9 0.000000 0.000000 0.0 0.000000
pol_10 0.000000 0.000000 0.0 0.032423
pol_11 0.000000 0.000000 0.0 0.000000
pol_12 0.000000 0.000000 0.0 0.000000
pol_13 0.000000 0.000000 0.0 0.000000
pol_14 0.000000 0.053543 0.0 0.000000
pol_15 0.000000 0.000000 0.0 0.000000
pol_16 0.000000 0.000000 0.0 0.000000
pol_17 0.000000 0.000000 0.0 0.000000
pol_18 0.000000 0.000000 0.0 0.053453
pol_19 0.000000 0.058344 0.0 0.000000
pol_20 0.054677 0.000000 0.0 0.000000
This is my code:
array = df.values
X = array[:,0:3]
Y = array[:,3]
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation =
model_selection.train_test_split(X, Y, test_size=validation_size,
random_state=seed)
seed = 7
scoring = 'accuracy'
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(LogisticRegression(), X_train, Y_train, cv=kfold, scoring=scoring)
print(cv_results)
This gives me the following error:
ValueError: Unknown label type: 'continuous'
How can this issue be tackled?
Also, I looked through certain links and found that the issue could be related to datatype which in my case is:
print(df.dtypes)
print(X_train.dtype)
pol_1 float64
pol_2 float64
pol_3 float64
pol_4 float64
pol_5 float64
pol_6 float64
pol_7 float64
pol_8 float64
pol_9 float64
pol_10 float64
pol_11 float64
pol_12 float64
pol_13 float64
pol_14 float64
pol_15 float64
pol_16 float64
pol_17 float64
pol_18 float64
pol_19 float64
pol_20 float64
Length: 20, dtype: object
float64
I tried to convert the datatype for X_train
and Y_train
to string
but got the same error.
Thanks!
Upvotes: 1
Views: 1555
Reputation: 19664
The type of Y
should be int
. That is, it should consist of integers that represent the class labels. However, in your data frame the Y
column consists of floats, and hence you get this error.
Upvotes: 1