How to fix folds in sklearn?

Question

I am applying CV in several prediction tasks and would like to use the same folds all the time for each of my parameter sets - and if possible also in different python scripts, since the performance really depends on the folds. I am working with sklearns KFold:

kf = KFold(n_splits=folds, shuffle=False, random_state=1986)

and build my folds by

for idx_split, (train_index, test_index) in enumerate(kf.split(X, Y)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = Y[train_index], Y[test_index]

and loop over them like

for idx_alpha, alpha in enumerate([0, 0.2, 0.4, 0.6, 0.8, 1]):
    # [...]
    for idx_split, (train_index, test_index) in enumerate(kf.split(X, Y)):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = Y[train_index], Y[test_index]**

Although I choose a random_state and set a numpy seed the folds are not equal all the time. What can I do to make this happen and possibly share my folds via several python scripts?

How to fix folds in sklearn?

Answers (1)

Related Questions