mihael2039
mihael2039

Reputation: 11

How can I get a confusion matrix of a single run in sklearn cross_validate?

When I do something like:

scoring = ["accuracy", "balanced_accuracy", "f1", "precision", "recall", "roc_auc"]

scores = cross_validate(SVC(), my_x, my_y, scoring = scoring , cv=5, verbose=3, return_train_score=True, return_estimator=True)

how can I get a confusion matrix of a single validation run, e.g. the first one or ideally the best one?

I don't need a plot or something beautiful, only the numbers. If I could see at least the split, then I could recalculate it.

Upvotes: 1

Views: 427

Answers (1)

afsharov
afsharov

Reputation: 5164

If you want to use cross-validation to perform something quite specific during each iteration, maybe it is best to use a CV splitter like StratifiedKFold :

from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC

svm = SVC()
kf = StratifiedKFold(n_splits=5)

scores = []
results = []
for train_index, test_index in kf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    svm.fit(X_train, y_train)
    y_pred = svm.predict(y_test)
    scores.append(accuracy_score(y_test, y_pred)) # use other scoring as prefered 
    results.append(confusion_matrix(y_test, y_pred))

This will compute the confusion matrix for each of the five iterations and store them in results. If you want to get the confusion matrix of the best validation round, you can additionally compute the scoring metric in the loop as well (see the scores list) and retrieve the corresponding confusion matrix.

Upvotes: 1

Related Questions