Reputation: 368
I want to perform stratified 10-fold cross validation using sklearn. The train and test indices can be obtained using
from sklearn.model_selection import StratifiedKFold
kf = StratifiedKFold(n_splits=10)
for fold, (train_index, test_index) in enumerate(kf.split(X, y), 1):
X_train = X[train_index]
y_train = y[train_index]
X_test = X[test_index]
y_test = y[test_index]
However, I would like to set not one, but two folds aside (one for tuning of hyperparameters). So, I want each iteration to consist of 8 folds for training, 1 for tuning and 1 for testing. Is this possible with sklearns StratifiedKFold? Or would I need to write a custom split method?
Upvotes: 1
Views: 794
Reputation: 88236
You could use StratifiedShuffleSplit
to further split the test set in a stratified way too:
from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit
kf = StratifiedKFold(n_splits=10)
for fold, (train_index, test_index) in enumerate(kf.split(X, y), 1):
X_train = X[train_index]
y_train = y[train_index]
X_test = X[test_index]
y_test = y[test_index]
#stratified split on the test set
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.5, random_state=0)
X_test_ix, X_tune_ix = next(sss.split(X_test, y_test))
X_test_ = X_test[X_test_ix]
y_test_ = y_test[X_test_ix]
X_tune = X_test[X_tune_ix]
y_tune = y_test[X_tune_ix]
Upvotes: 1