s.ali
s.ali

Reputation: 156

Difference between balanced_accuracy_score and accuracy_score

I am using balanced_accuracy_score and accuracy_score both in sklearn.metrics.

According to documentation, those two metrics are the same but in my code, the first is giving me 96% and the second one is 97% while accuracy from training is 98%

Can you explain to me what is the difference between the three accuracies and how each is computed?

Note: the problem is a multi-classification problem with three classes.

I have attached code samples.

accuracy is 98%

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=0.00001),
              metrics=['accuracy'])

accuracy is 96%

from sklearn.metrics import balanced_accuracy_score
balanced_accuracy_score(all_labels, all_predications)

accuracy is 97%

from sklearn.metrics import accuracy_score
accuracy_score(all_labels, all_predications)

Upvotes: 7

Views: 12973

Answers (2)

maleckicoa
maleckicoa

Reputation: 571

Accuracy = tp+tn/(tp+tn+fp+fn) doesn't work well for unbalanced classes.

Therefore we can use Balanced Accuracy = TPR+TNR/2

TPR= true positive rate = tp/(tp+fn) : also called 'sensitivity'

TNR = true negative rate= tn/(tn+fp) : also caled 'specificity'

Balanced Accuracy gives almost the same results as ROC AUC Score.

Links:

1 https://en.wikipedia.org/wiki/Precision_and_recall

2 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html#sklearn.metrics.balanced_accuracy_score

3 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

Upvotes: 8

seulberg1
seulberg1

Reputation: 1013

As far as I understand the problem (without knowing what all_labels, all_predictions) is run on, the difference in your out of sample predictions between balanced_accuracy_score and accuracy_score is caused by the balancing of the former function.

accuracy_score simply returns the percentage of labels you predicted correctly (i.e. there are 1000 labels, you predicted 980 accurately, i.e. you get a score of 98%.

balanced_accuracy_score however works differently in that it returns the average accuracy per class, which is a different metric. Say your 1000 labels are from 2 classes with 750 observations in class 1 and 250 in class 2. If you miss-predict 10 in each class, you have an accuracy of 740/750= 98.7% in class 1 and 240/250=96% in class 2. balanced_accuracy_score would then return (98.7%+96%)/2 = 97.35%. So I believe the program to work as expected, based on the documentation.

Upvotes: 16

Related Questions