Aditya S
Aditya S

Reputation: 53

How to fix "The least populated class in y has only one member" Scikit learn

I am creating a program using past datasets to predict an employees salary for any job. I recieve the error "Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=5."

p_train, p_test, t_train, t_test = train_test_split(predictors, target target, test_size=0.25, random_state=1)
model = KNeighborsClassifier()
param_grid = {'n_neighbors': np.arange(1, 25)}
modelGSCV = GridSearchCV(model, param_grid, cv=5)

Here is where I tried splitting and received the error. I am pretty new to Machine Learning so would appreciate if anyone could guide me on how to fix this.

Upvotes: 0

Views: 7661

Answers (1)

Sesquipedalism
Sesquipedalism

Reputation: 1733

From the GridSearchCV documentation:

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

You must have a multiclass classification problem. Since StratifiedKFold is used, you need to have at least 5 examples of each class in your data. If you have at least one class with < 5 examples, this error will be thrown.

A simple solution would be to drop classes with < 5 examples or to reduce the number of folds.

Upvotes: 1

Related Questions