Franck Dernoncourt
Franck Dernoncourt

Reputation: 83137

Getting the accuracy for multi-label prediction in scikit-learn

In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

This way of computing the accuracy is sometime named, perhaps less ambiguously, exact match ratio (1):

enter image description here

Is there any way to get the other typical way to compute the accuracy in scikit-learn, namely

enter image description here

(as defined in (1) and (2), and less ambiguously referred to as the Hamming score (4) (since it is closely related to the Hamming loss), or label-based accuracy) ?


(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).

(2) Tsoumakas, Grigorios, and Ioannis Katakis. "Multi-label classification: An overview." Dept. of Informatics, Aristotle University of Thessaloniki, Greece (2006).

(3) Ghamrawi, Nadia, and Andrew McCallum. "Collective multi-label classification." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005.

(4) Godbole, Shantanu, and Sunita Sarawagi. "Discriminative methods for multi-labeled classification." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 22-30.

Upvotes: 28

Views: 23962

Answers (4)

Ierihon
Ierihon

Reputation: 11

I really like the comments above, but I want to add an answer that goes deep into the Hamming score concept.

Imagine having three classes and an object corresponding to class 1 and class 2 by ground truth. Your classifier predicts this sample as being of class 1 only. How do Accuracy and the Hamming score view such a situation?

Accuracy: This prediction is incorrect as the predicted classes are not fully equal to the ground-truth ones. The Accuracy score is 0;

Hamming score: We split the ground truth into two parts - one for class 1 and the other for class 2. In such a case, the algorithm got one part correct and failed on the other. The Hamming score for the prediction is 0.5.

When evaluating a multi-label task, the Hamming score will consider the partially correct predictions. The Hamming score algorithm for the multi-label Classification task is as follows:

  1. Get predictions from your model;
  2. Split the ground truth and predictions into parts;
  3. Compare the corresponding pieces and calculate the number of correct predictions;
  4. Divide it by the total prediction number and analyze the obtained value.

I learned a lot about the Hamming score here - check it out if you are interested in the concept of this metric and how it's actually different from the Accuracy score

Upvotes: 1

Daniel Sun
Daniel Sun

Reputation: 90

Imported answer based on nocibambi's. Handling zero divisions

    def hamming_score(y_true: np.ndarray, y_pred: np.ndarray):
        numerator = (y_true & y_pred).sum(axis=1)
        denominator = (y_true | y_pred).sum(axis=1)

        return np.divide(numerator, denominator, out=np.ones_like(numerator, dtype=np.float_),
                         where=denominator != 0).mean()

replace np.ones_like with np.zeros_like if you want 0 as the result of zero division.

Upvotes: 1

nocibambi
nocibambi

Reputation: 2421

A simple summary function:

import numpy as np

def hamming_score(y_true, y_pred):
    return (
        (y_true & y_pred).sum(axis=1) / (y_true | y_pred).sum(axis=1)
    ).mean()


hamming_score(y_true, y_pred)
# 0.375

Upvotes: 4

William
William

Reputation: 598

You can write one version yourself, here is a example without considering the weight and normalize.

import numpy as np

y_true = np.array([[0,1,0],
                   [0,1,1],
                   [1,0,1],
                   [0,0,1]])

y_pred = np.array([[0,1,1],
                   [0,1,1],
                   [0,1,0],
                   [0,0,0]])

def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):
    '''
    Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case
    http://stackoverflow.com/q/32239577/395857
    '''
    acc_list = []
    for i in range(y_true.shape[0]):
        set_true = set( np.where(y_true[i])[0] )
        set_pred = set( np.where(y_pred[i])[0] )
        #print('\nset_true: {0}'.format(set_true))
        #print('set_pred: {0}'.format(set_pred))
        tmp_a = None
        if len(set_true) == 0 and len(set_pred) == 0:
            tmp_a = 1
        else:
            tmp_a = len(set_true.intersection(set_pred))/\
                    float( len(set_true.union(set_pred)) )
        #print('tmp_a: {0}'.format(tmp_a))
        acc_list.append(tmp_a)
    return np.mean(acc_list)

if __name__ == "__main__":
    print('Hamming score: {0}'.format(hamming_score(y_true, y_pred))) # 0.375 (= (0.5+1+0+0)/4)

    # For comparison sake:
    import sklearn.metrics

    # Subset accuracy
    # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise.
    print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)))

    # Hamming loss (smaller is better)
    # $$ \text{HammingLoss}(x_i, y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i, y_i)}{|L|}, $$
    # where
    #  - \\(|D|\\) is the number of samples  
    #  - \\(|L|\\) is the number of labels  
    #  - \\(y_i\\) is the ground truth  
    #  - \\(x_i\\)  is the prediction.  
    # 0.416666666667 (= (1+0+3+1) / (3*4) )
    print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred))) 

Outputs:

Hamming score: 0.375
Subset accuracy: 0.25
Hamming loss: 0.416666666667

Upvotes: 27

Related Questions