Sm1
Sm1

Reputation: 570

Calculating accuracy for multi-class classification

Consider a three class classification problem with the following confusion matrix.

cm_matrix = 
                predict_class1    predict_class2    predict_class3
                 ______________    ______________    ______________

Actual_class1         2000                 0                 0     
Actual_class2           34              1966                 0     
Actual_class3            0                 0              2000   



Multi-Class Confusion Matrix Output
                     TruePositive    FalsePositive    FalseNegative    TrueNegative
                     ____________    _____________    _____________    ____________

    Actual_class1        2000             34                0              3966    
    Actual_class2        1966              0               34              4000    
    Actual_class3        2000              0                0              4000    

The formula that I have used are:

Accuracy Of Each class=(TP ./total instances of that class)

( formula based on an answer here: Individual class accuracy calculation confusion)

Sensitivity=TP./TP+FN ;

The implementation of it in Matlab is:

acc_1  = 100*(cm_matrix(1,1))/sum(cm_matrix(1,:)) = 100*(2000)/(2000+0+0) = 100
acc_2  = 100*(cm_matrix(2,2))/sum(cm_matrix(2,:)) =  100*(1966)/(34+1966+0) = 98.3
acc_3  = 100*(cm_matrix(3,3))/sum(cm_matrix(3,:)) = 100*(2000)/(0+0+2000) = 100

sensitivity_1 = 2000/(2000+0)=1 = acc_1
sensitivity_2 =  1966/(1966+34) = 98.3 = acc_2
sensitivity_3 = 2000/2000 = 1 = acc_3

Question1) Is my formula for Accuracy of each class correct? For calculating accuracy of each individual class, say for positive class I should take the TP in the numerator. Similarly, for accuracy of only the negative class, I should consider TN in the numerator in the formula for accuracy. Is the same formula applicable to binary classification? Is my implementation of it correct?

Question2) Is my formula for sensitivity correct? Then how come I am getting same answer as individual class accuracies?

Upvotes: 6

Views: 24302

Answers (2)

beaker
beaker

Reputation: 16821

Question1) Is my formula for Accuracy of each class correct?

No, the formula you're using is for the Sensitivity (Recall). See below.

For calculating accuracy of each individual class, say for positive class I should take the TP in the numerator. Similarly, for accuracy of only the negative class, I should consider TN in the numerator in the formula for accuracy. Is the same formula applicable to binary classification? Is my implementation of it correct?

Accuracy is the ratio of the number of correctly classified instances to the total number of instances. TN, or the number of instances correctly identified as not being in a class, are correctly classified instances, too. You cannot simply leave them out.

Accuracy is also normally only used for evaluating the entire classifier for all classes, not individual classes. You can, however, generalize the accuracy formula to handle individual classes, as done here for computing the average classification accuracy for a multiclass classifier. (See also the referenced article.)

The formula they use for each class is:

enter image description here

As you can see, it is identical to the usual formula for accuracy, but we only take into account the individual class's TP and TN scores (the denominator is still the total number of observations). Applying this to your data set, we get:

acc_1 = (2000+3966)/(2000+34+0+3966) = 0.99433
acc_2 = (1966+4000)/(1966+0+34+4000) = 0.99433
acc_3 = (2000+4000)/(2000+0+0+4000)  = 1.00000

This at least makes more intuitive sense, since the first two classes had mis-classified instances and the third did not. Whether these measures are at all useful is another question.


Question2) Is my formula for sensitivity correct?

Yes, Sensitivity is given as:

TP / TP+FN

which is the ratio of the instances correctly identified as being in this class to the total number of instances in the class. In a binary classifier, you are by default calculating the sensitivity for the positive class. The sensitivity for the negative class is the error rate (also called the miss rate or false negative rate in the wikipedia article) and is simply:

FN / TP+FN === 1 - Sensitivity

FN is nothing more than the TP for the negative class! (The meaning of TP is likewise reversed.) So it is natural to extend this to all classes as you have done.

Then how come I am getting same answer as individual class accuracies?

Because you're using the same formula for both.

Look at your confusion matrix:

cm_matrix = 
                predict_class1    predict_class2    predict_class3
                 ______________    ______________    ______________

Actual_class1         2000                 0                 0     
Actual_class2           34              1966                 0     
Actual_class3            0                 0              2000

TP for class 1 is obviously 2000

cm_matrix(1,1)

FN is the sum of the other two columns in that row. Therefore, TP+FN is the sum of row 1

sum(cm_matrix(1,:) 

That's exactly the formula you used for the accuracy.

acc_1  = 100*(cm_matrix(1,1))/sum(cm_matrix(1,:)) = 100*(2000)/(2000+0+0) = 100

Upvotes: 3

Catalina Chircu
Catalina Chircu

Reputation: 1574

Answer to question 1. It seems that accuracy is used only in binary classification, check this link. You refer to an answer on this site, but it concerns also a binary classification (i.e. classification into 2 classes only). You seem to have more than two classes, and in this case you should try something else, or a one-versus-all classification for each class (for each class, parse prediction for class_n and non_class_n).

Answer to question 2. Same issue, this measure is appropriate for binary classification which is not your case.

The formula for sensitivity is:

TP./(TP + FN)

The formula for accuracy is:

(TP)./(TP+FN+FP+TN)

See the documentation here.

UPDATE

And if you wish to use the confusion matrix, you have:

TP on the diagonal, at the level of the class FN the sum of all the values in the column of the class. In the function getvalues start counting lines from the declaration of the function and check lines 30 and 31:

TP(i)=c_matrix(i,i);
FN(i)=sum(c_matrix(i,:))-c_matrix(i,i);
FP(i)=sum(c_matrix(:,i))-c_matrix(i,i);
TN(i)=sum(c_matrix(:))-TP(i)-FP(i)-FN(i);

If you apply the accuracy formula, you obtain, after calculating and simplifying :

accuracy = c_matrix(i,i) / sum(c_matrix(:))

For the sensitivity you obtain, after simplifying:

sensitivity =  c_matrix(i,i) / sum(c_matrix(i,:))

If you want to understand better, just check the links I sent you.

Upvotes: 3

Related Questions