Reputation: 11
When I do something like:
scoring = ["accuracy", "balanced_accuracy", "f1", "precision", "recall", "roc_auc"]
scores = cross_validate(SVC(), my_x, my_y, scoring = scoring , cv=5, verbose=3, return_train_score=True, return_estimator=True)
how can I get a confusion matrix of a single validation run, e.g. the first one or ideally the best one?
I don't need a plot or something beautiful, only the numbers. If I could see at least the split, then I could recalculate it.
Upvotes: 1
Views: 427
Reputation: 5164
If you want to use cross-validation to perform something quite specific during each iteration, maybe it is best to use a CV splitter like StratifiedKFold
:
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
svm = SVC()
kf = StratifiedKFold(n_splits=5)
scores = []
results = []
for train_index, test_index in kf.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
svm.fit(X_train, y_train)
y_pred = svm.predict(y_test)
scores.append(accuracy_score(y_test, y_pred)) # use other scoring as prefered
results.append(confusion_matrix(y_test, y_pred))
This will compute the confusion matrix for each of the five iterations and store them in results
. If you want to get the confusion matrix of the best validation round, you can additionally compute the scoring metric in the loop as well (see the scores
list) and retrieve the corresponding confusion matrix.
Upvotes: 1