Reputation: 49
I'm trying to model a binary classification problem with custom cross validation folds with SVM classifier but it gives me the error **need at least one array to concatenate ** with the cross_val_predict. The Code works fine with cv=3 in cros_val_predict but when I use custom_cv, it gives this error.
Below is the code:
from sklearn.model_selection import LeavePOut
import numpy as np
from sklearn.svm import SVC
from time import *
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_predict,cross_val_score
clf = SVC(kernel='linear',C=25)
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8],[9,10]])
y = np.array([0,1,1,0,0])
lpo = LeavePOut(2)
print(lpo.get_n_splits(X))
LeavePOut(p=2)
test_index_list=[]
train_index_list=[]
for train_index, test_index in lpo.split(X,y):
if(y[test_index[0]]==y[test_index[1]]):
pass
else:
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
train_index_list.append(train_index)
test_index_list.append(test_index)
custom_cv = zip(train_index_list, test_index_list)
scores = cross_val_score(clf, X, y, cv=custom_cv)
print(scores)
print('accuracy:',scores.mean())
predicted=cross_val_predict(clf,X,y,cv=custom_cv) # error with this line
print('Confusion matrix:',confusion_matrix(labels, predicted))
Below is full trace of error:
ValueError Traceback (most recent call last)
<ipython-input-11-d78feac932b2> in <module>()
31 print(scores)
32 print('accuracy:',scores.mean())
---> 33 predicted=cross_val_predict(clf,X,y,cv=custom_cv)
34
35 print('Confusion matrix:',confusion_matrix(labels, predicted))
/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
758 predictions = [pred_block_i for pred_block_i, _ in prediction_blocks]
759 test_indices = np.concatenate([indices_i
--> 760 for _, indices_i in prediction_blocks])
761
762 if not _check_is_permutation(test_indices, _num_samples(X)):
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
Any suggestion about how to solve this error?
Upvotes: 0
Views: 1039
Reputation: 837
There are 2 errors here:
zip
object, create a list out of it. The object gets exhausted after you use it once. You can fix it like this:custom_cv = [*zip(train_index_list, test_index_list)]
cross_val_predict
should be partitions of actual array(Each sample should only belong to exactly one test set). In your case it isn't. If you think about it, stacking output from your cross validation list would result in length 6 array, while original y has length 5. You can implement custom cross val predict like this:def custom_cross_val_predict(clf, X, y, cv):
y_pred, y_true = [], []
for tr_idx, vl_idx in cv:
X_tr, y_tr = X[tr_idx], y[tr_idx]
X_vl, y_vl = X[vl_idx], y[vl_idx]
clf.fit(X_tr, y_tr)
y_true.extend(y_vl)
y_pred.extend(clf.predict(X_vl))
return y_true, y_pred
labels, predicted = custom_cross_val_predict(clf,X,y,cv=custom_cv)
print('Confusion matrix:',confusion_matrix(labels, predicted))
Upvotes: 1