Reputation: 31
I'm using RepeatedKFold for classification. And I'd like to get the actual and predicted values of each iteration, in all the repeats. The code is the one below :
#...Split the dataset...
kf = RepeatedKFold(n_splits=10, n_repeats=2)
kf.get_n_splits(X, y)
for train_index, test_index in kf.split(X, y):
print("TRAIN" + str(train_index))
print("TEST" + str(test_index))
print("-----")
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
#...Feature Scaling...
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#...Train the models on the training set...
#Logistic Regression
classifier_1 = LogisticRegression(random_state = 0, solver='saga').fit(X_train, y_train.ravel())
#KNN
classifier_2 = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2).fit(X_train, y_train)
#Support Vecton Machine
classifier_3=SVC(kernel='linear', probability=True, random_state=0).fit(X_train, y_train)
#Kernel SVM
classifier_4 = SVC(kernel = 'rbf', probability=True, random_state = 0).fit(X_train, y_train)
#...Get actual & oof predicted labels...
y_pred_oof_1 = cross_val_predict(classifier_1, X_test, y_test, cv=kf)
But I'm getting
ValueError Traceback (most recent call last)
<ipython-input-12-05331415c07a> in <module>()
1 #print(X)
2 #print(y)
----> 3 y_pred_oof_1 = cross_val_predict(classifier_1, X_test, y_test, cv=kf)
4
5 for i in range(len(y_test)):
/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
761
762 if not _check_is_permutation(test_indices, _num_samples(X)):
--> 763 raise ValueError('cross_val_predict only works for partitions')
764
765 inv_test_indices = np.empty(len(test_indices), dtype=int)
ValueError: cross_val_predict only works for partitions
Can somebody please tell me what can I change? (The cross_val_predict is not inside the for-loop)
Upvotes: 2
Views: 951
Reputation: 168
Your issue is explained in this discussion: https://github.com/scikit-learn/scikit-learn/issues/16135
cross_val_predict
will provide for each of your samples one prediction value, however with RepeatedKFold
for each sample multiple predictions are made.
Upvotes: 2