hanzgs
hanzgs

Reputation: 1616

ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1

I am getting the error

ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1

same question available ValueError: Cannot have number of splits n_splits=3 greater than the number of samples: 1

for me

I used python 3.8.6 with scikit-learn==0.23.2, everything worked fine,

I updated to python 3.9.5 with scikit-learn==0.24.2, got this error, i have 191 samples in X_test. I am unsure why library version causing this issue.

Using cv=3 with total 1000 records dataset

Full Code

X_train, X_test, y_train, y_test = train_test_split(features,
                                                    labels,
                                                    test_size=0.2,
                                                    random_state=200)
smote_enn = SMOTEENN(sampling_strategy='all', random_state=127)
X_train_senn, y_train_senn = smote_enn.fit_resample(X_train, y_train)
lgbmclassifier = LGBMClassifier(boosting_type='gbdt',
                                max_depth=-1,
                                device_type=deviceType,
                                verbose=0,
                                objective='binary',
                                class_weight='balanced',
                                force_row_wise=True,
                                subsample_for_bin=200000,
                                min_child_samples=20,
                                random_state=50)
lgbmgrid = CVGrid(lgbmclassifier, FE_Hyperparamerters)
lgbmgrid_result = lgbmgrid.fit(X_train,
                               y_train,
                               eval_metric='auc',
                               eval_set=[(X_test, y_test)],
                               early_stopping_rounds=ESR,
                               verbose=1)

Error

 File "C:\prg\utils.py", line 851, in feHPTuning
    lgbmgrid_result = lgbmgrid.fit(X_train,
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\model_selection\_search.py", line 762, in fit
    cv_orig = check_cv(self.cv, y, classifier=is_classifier(estimator))
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\model_selection\_split.py", line 2062, in check_cv
    return StratifiedKFold(cv)
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\model_selection\_split.py", line 636, in __init__
    super().__init__(n_splits=n_splits, shuffle=shuffle,
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\prg\Anaconda3\envs\automl_py395elk7120_2\lib\site-packages\sklearn\model_selection\_split.py", line 280, in __init__
    raise ValueError(
ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1.

fit function creates the error

Upvotes: 0

Views: 3380

Answers (1)

Antoine Dubuis
Antoine Dubuis

Reputation: 5304

This error is pretty straightforward. You cannot perform a Kfold split with only 1 split.

The Kfold documentation states that n_splits is the number of folds and must be at least 2.

If you want to perform only a single split you should use sklearn.model_selection.train_test_split.

Upvotes: 1

Related Questions