Reputation: 1103
I try to run following code.
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7], train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6], test[:, 8]]
Xtest = np.nan_to_num(Xtest)
# model
lr = LogisticRegression()
lr.fit(X, y)
where y
is a np.ndarray
of 0s and 1s.
However, I receive the following error:
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line >1174, in fit
check_classification_targets(y)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 172, >in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'unknown'
From sklearn documentation, I see that
y : array-like, shape (n_samples,)
Target values (class labels in classification, real numbers in regression)
What is my error?
FYI, y
is np.array([0.0, 1.0, 1.0, ..., 0.0, 1.0, 0.0], dtype=object)
whose size is (891,)
.
Upvotes: 93
Views: 215852
Reputation: 23489
Target variable in a Logistic Regression can be of type binary
(e.g. np.random.randint(2, size=100)
) or multiclass
(e.g. np.random.randint(3, size=100)
). It may be verified using sklearn.utils.multiclass.type_of_target
. For example:
from sklearn.utils.multiclass import type_of_target
y = np.random.randint(3, size=100)
type_of_target(y) # multiclass
If we look at the source code, a target can be unknown
type in 3 cases.
[[[1, 2]]]
. y = np.ravel(y)
.[[]]
. object
dtype whose first element is not a string, e.g. np.array([1, 2], dtype=object)
. object
before model instantiation; even converting to a list works. All of the following should work:
y = y.astype(str)
y = y.astype(float)
y = y.tolist()
Upvotes: 0
Reputation: 91
I also got a similar type of error. I found out that my target is a non-integer type. After casting my Target variable to the integer type. Boom, error is solved
y = train_data['Y'].astype('int')
Upvotes: 8
Reputation: 59
Adding to Miriam ,I also got the similar error but in my case individual elements of y_pred was of type 'np.int32'
and individual elements of y was of type 'int'
.
I solved it by doing:
for i,x in enumerate(y_pred):
y_pred[i]=x.astype('int')
Upvotes: 1
Reputation: 19664
Your y
is of type object
, so sklearn cannot recognize its type. Add the line y=y.astype('int')
right after the line y = train[:, 1]
.
Upvotes: 197