user3025898
user3025898

Reputation: 571

cross_val_score return accuracy per class

I would like the cross_val_score from sklearn function to return the accuracy per each of the classes instead of the average accuracy of all the classes.

Function:

sklearn.model_selection.cross_val_score(estimator, X, y=None, groups=None,  
       scoring=None, cv=’warn’, n_jobs=None, verbose=0, fit_params=None, 
       pre_dispatch=‘2*n_jobs’, error_score=’raise-deprecating’)

Reference

How can I do it?

Upvotes: 4

Views: 2551

Answers (1)

MaximeKan
MaximeKan

Reputation: 4211

This is not possible with cross_val_score. The approach you suggest would mean cross_val_score would have to return an array of arrays. However, if you look at the source code, you will see that the output of cross_val_score has to be :

Returns
-------
scores : array of float, shape=(len(list(cv)),)
    Array of scores of the estimator for each run of the cross validation.

As a result, cross_val_score checks if the scoring method you are using is multimetric or not. If it is, it will throw you an error like:

ValueError: scoring must return a number, got ... instead

Edit:

Like it is correctly pointed out by a comment above, an alternative for you is to use cross_validate instead. Here is how it would work on the Iris dataset for instance:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import recall_score

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

scoring = {'recall0': make_scorer(recall_score, average = None, labels = [0]), 
       'recall1': make_scorer(recall_score, average = None, labels = [1]),
       'recall2': make_scorer(recall_score, average = None, labels = [2])}

cross_validate(DecisionTreeClassifier(),X,y, scoring = scoring, cv = 5, return_train_score = False)

Note that this is also supported by the GridSearchCV methodology.

NB: You cannot return "accuracy by each class", I guess you meant recall, which is basically the proportions of correct predictions amongst data points that actually belong to a class.

Upvotes: 8

Related Questions