Reputation: 83427

Getting the accuracy for multi-label prediction in scikit-learn

In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1):

Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely

(as defined in (1) and (2), and less ambiguously referred to as the Hamming score (4) (since it is closely related to the Hamming loss), or label-based accuracy) ?

(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).

(2) Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).

(3) Ghamrawi, Nadia, and Andrew McCallum. "Collective multi-label classification." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.

(4) Godbole, Shantanu, and Sunita Sarawagi. "Discriminative methods for multi-labeled classification." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 22-30.

Upvotes: 28

Answers (4)

Ierihon

Reputation: 11

I really like the comments above, but I want to add an answer that goes deep into the Hamming score concept.

Imagine having three classes and an object corresponding to class 1 and class 2 by ground truth. Your classifier predicts this sample as being of class 1 only. How do Accuracy and the Hamming score view such a situation?

Accuracy: This prediction is incorrect as the predicted classes are not fully equal to the ground-truth ones. The Accuracy score is 0;

Hamming score: We split the ground truth into two parts - one for class 1 and the other for class 2. In such a case, the algorithm got one part correct and failed on the other. The Hamming score for the prediction is 0.5.

When evaluating a multi-label task, the Hamming score will consider the partially correct predictions. The Hamming score algorithm for the multi-label Classification task is as follows:

Get predictions from your model;
Split the ground truth and predictions into parts;
Compare the corresponding pieces and calculate the number of correct predictions;
Divide it by the total prediction number and analyze the obtained value.

I learned a lot about the Hamming score here - check it out if you are interested in the concept of this metric and how it's actually different from the Accuracy score

Upvotes: 1

Daniel Sun

Reputation: 90

Imported answer based on nocibambi's. Handling zero divisions

    def hamming_score(y_true: np.ndarray, y_pred: np.ndarray):
        numerator = (y_true & y_pred).sum(axis=1)
        denominator = (y_true | y_pred).sum(axis=1)

        return np.divide(numerator, denominator, out=np.ones_like(numerator, dtype=np.float_),
                         where=denominator != 0).mean()

replace np.ones_like with np.zeros_like if you want 0 as the result of zero division.

Upvotes: 1

nocibambi

Reputation: 2431

A simple summary function:

import numpy as np

def hamming_score(y_true, y_pred):
    return (
        (y_true & y_pred).sum(axis=1) / (y_true | y_pred).sum(axis=1)
    ).mean()


hamming_score(y_true, y_pred)
# 0.375

Upvotes: 4

William

Reputation: 598

You can write one version yourself, here is a example without considering the weight and normalize.

import numpy as np

y_true = np.array([[0,1,0],
                   [0,1,1],
                   [1,0,1],
                   [0,0,1]])

y_pred = np.array([[0,1,1],
                   [0,1,1],
                   [0,1,0],
                   [0,0,0]])

def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):
    '''
    Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case
    http://stackoverflow.com/q/32239577/395857
    '''
    acc_list = []
    for i in range(y_true.shape[0]):
        set_true = set( np.where(y_true[i])[0] )
        set_pred = set( np.where(y_pred[i])[0] )
        #print('\nset_true: {0}'.format(set_true))
        #print('set_pred: {0}'.format(set_pred))
        tmp_a = None
        if len(set_true) == 0 and len(set_pred) == 0:
            tmp_a = 1
        else:
            tmp_a = len(set_true.intersection(set_pred))/\
                    float( len(set_true.union(set_pred)) )
        #print('tmp_a: {0}'.format(tmp_a))
        acc_list.append(tmp_a)
    return np.mean(acc_list)

if __name__ == "__main__":
    print('Hamming score: {0}'.format(hamming_score(y_true, y_pred))) # 0.375 (= (0.5+1+0+0)/4)

    # For comparison sake:
    import sklearn.metrics

    # Subset accuracy
    # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise.
    print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)))

    # Hamming loss (smaller is better)
    # $$ \text{HammingLoss}(x_i, y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i, y_i)}{|L|}, $$
    # where
    #  - \\(|D|\\) is the number of samples  
    #  - \\(|L|\\) is the number of labels  
    #  - \\(y_i\\) is the ground truth  
    #  - \\(x_i\\)  is the prediction.  
    # 0.416666666667 (= (1+0+3+1) / (3*4) )
    print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred)))

Outputs:

Hamming score: 0.375
Subset accuracy: 0.25
Hamming loss: 0.416666666667

Upvotes: 27

Getting the accuracy for multi-label prediction in scikit-learn

Answers (4)

Related Questions