Reputation: 53
I am creating a program using past datasets to predict an employees salary for any job. I recieve the error "Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=5."
p_train, p_test, t_train, t_test = train_test_split(predictors, target target, test_size=0.25, random_state=1)
model = KNeighborsClassifier()
param_grid = {'n_neighbors': np.arange(1, 25)}
modelGSCV = GridSearchCV(model, param_grid, cv=5)
Here is where I tried splitting and received the error. I am pretty new to Machine Learning so would appreciate if anyone could guide me on how to fix this.
Upvotes: 0
Views: 7661
Reputation: 1733
From the GridSearchCV documentation:
For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.
You must have a multiclass classification problem. Since StratifiedKFold is used, you need to have at least 5 examples of each class in your data. If you have at least one class with < 5 examples, this error will be thrown.
A simple solution would be to drop classes with < 5 examples or to reduce the number of folds.
Upvotes: 1