How to get the adjacent accuracy scores for a multiclass classification problem in Python?

Question

I want to calculate the average percentage hit rate of the true class and the adjacent classes and implement it in my cross validation.

#Example of my classification problem (in total 9 classes)
y_true = [0, 0, 1, 5, 3, 4]
y_pred = [0, 1, 0, 8, 6, 5]

The regular accuracy would result in 16,67 (the first prediction is the only one that's true). However, I would like to get the 'adjacent accuracy' which would be 66,67% in this case (the three first predictions are 'correct', together with the last one).

The formula would be like this: adjacent accuracy formula

where Pi stands for the total number of samples classified as class i, g is the total number of classes (= here 9), and n is the total number of samples.

I have already looked at this other question but it isn't particularly helpful since I would like to incorporate this scoring measure into a cross_validate function.

This is my current code:

scoringX = {'acc': 'accuracy',
       'prec_macro': 'precision_macro',
       'rec_macro': 'recall_macro',
      'auc': 'roc_auc_ovr_weighted'}
cv_scores_rf = cross_validate(clf, X, y, cv=kcv, scoring = scoringX)
cv_predict_rf = cross_val_predict(clf, X, y, cv=kcv)

This is would I would ideally like to end up with

scoringX = {'acc': 'accuracy',
       'prec_macro': 'precision_macro',
       'rec_macro': 'recall_macro',
       'auc': 'roc_auc_ovr_weighted',
       'adjacent_auc': make_scorer(custom_adjacent_accuracy_score)}
cv_scores_rf = cross_validate(clf, X, y, cv=kcv, scoring = scoringX)
cv_predict_rf = cross_val_predict(clf, X, y, cv=kcv)

Thanks in advance!

Erlend Magnus Viggen · Accepted Answer

I actually wrote a question on Cross Validated a few months ago on how to express adjacent accuracy mathematically, and after some thinking I answered it with a formula that's a bit simpler than the one you gave. (You'll unfortunately have to follow the link to see it; Stack Overflow doesn't support math typesetting.)

This formula can be implemented fairly easily if we convert y_true and y_pred into numpy arrays:

import numpy as np

y_true = np.array([0, 0, 1, 5, 3, 4])
y_pred = np.array([0, 1, 0, 8, 6, 5])

precise_accuracy = np.sum(y_pred == y_true) / len(y_pred)
adjacent_accuracy = np.sum(np.abs(y_pred - y_true) <= 1) / len(y_pred)

I included the simpler calculation for precise accuracy to help make the adjacent accuracy easier to understand by comparison:

In the precise accuracy, we simply count the number of predictions that equal the true value and normalise by the number of predictions. y_pred == y_true is an array of True and False, and the summation simply counts the number of True values.
In the adjacent accuracy, we instead count the number of predictions whose ‘class distance’ np.abs(y_pred - y_true) to the true value is no more than one.

I think the function you want can be implemented like this:

def custom_adjacent_accuracy_score(y_true, y_pred):
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    return np.sum(np.abs(y_pred - y_true) <= 1) / len(y_pred)

How to get the adjacent accuracy scores for a multiclass classification problem in Python?

Answers (1)

Related Questions