Liassa M.
Liassa M.

Reputation: 31

RepeatedKFold & cross_val_predict

I'm using RepeatedKFold for classification. And I'd like to get the actual and predicted values of each iteration, in all the repeats. The code is the one below :

#...Split the dataset...
 kf = RepeatedKFold(n_splits=10, n_repeats=2)
 kf.get_n_splits(X, y)
 for train_index, test_index in kf.split(X, y):
   print("TRAIN" + str(train_index))
   print("TEST" + str(test_index))
   print("-----")
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

#...Feature Scaling...
   sc = StandardScaler()
   X_train = sc.fit_transform(X_train)
   X_test = sc.transform(X_test)

#...Train the models on the training set...
#Logistic Regression
   classifier_1 = LogisticRegression(random_state = 0, solver='saga').fit(X_train, y_train.ravel())

#KNN
   classifier_2 = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2).fit(X_train, y_train) 

#Support Vecton Machine
   classifier_3=SVC(kernel='linear', probability=True, random_state=0).fit(X_train, y_train)

#Kernel SVM
   classifier_4 = SVC(kernel = 'rbf', probability=True, random_state = 0).fit(X_train, y_train) 

#...Get actual & oof predicted labels...
y_pred_oof_1 = cross_val_predict(classifier_1, X_test, y_test, cv=kf)

But I'm getting

 ValueError                                Traceback (most recent call last)
<ipython-input-12-05331415c07a> in <module>()
      1 #print(X)
      2 #print(y)
----> 3 y_pred_oof_1 = cross_val_predict(classifier_1, X_test, y_test, cv=kf)
      4 
      5 for i in range(len(y_test)):

/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
    761 
    762     if not _check_is_permutation(test_indices, _num_samples(X)):
--> 763         raise ValueError('cross_val_predict only works for partitions')
    764 
    765     inv_test_indices = np.empty(len(test_indices), dtype=int)

ValueError: cross_val_predict only works for partitions
   

Can somebody please tell me what can I change? (The cross_val_predict is not inside the for-loop)

Upvotes: 2

Views: 951

Answers (1)

jhmt
jhmt

Reputation: 168

Your issue is explained in this discussion: https://github.com/scikit-learn/scikit-learn/issues/16135

cross_val_predict will provide for each of your samples one prediction value, however with RepeatedKFold for each sample multiple predictions are made.

Upvotes: 2

Related Questions