Reputation: 513
When I ran the following:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
clf = SVC(kernel='rbf', probability=True)
clf.fit(x_train, y_train)
I received the ValueError
: The number of classes has to be greater than one; got 1 class
I did: print(np.unique(y_train))
, which returned [0]
.
Can anyone point me in the right direction for a solution?
Upvotes: 3
Views: 3668
Reputation: 4273
Using stratify
with train_test_split
makes this less likely:
train_test_split(X, y, stratify=y)
Explanation: train_test_split
has randomness. It's possible to produce splits where y_train
does not contain positive and negative examples, which means we cannot train a discriminative classifier:
from sklearn.model_selection import train_test_split
import numpy as np
X = np.ones((8, 2))
y = np.array([0, 0, 0, 0, 0, 0, 1, 1])
_, _, y_train, y_test = train_test_split(X, y, random_state=33)
# y_train: [0 0 0 0 0 0] # <--- Uh oh, there are no 1s in the training set!
# y_test: [1 1]
Stratification first separates data based on the label. This means our training data should have at least one of each label:
_, _, y_train, y_test = train_test_split(X, y, stratify=y)
# y_train: [0 0 0 0 1 0]
# y_test: [0 1]
Upvotes: 0
Reputation: 335
Either your y
list contains no 1's, or the 1's in y
are few enough that y_train
may end up containing no 1s. You should print y
, and in case it contains 1's, you need to change your splitting strategy to ensure that all classes are present in y_train
and y_test
at least once
Upvotes: 1