Reputation: 75
I want to calculate the average percentage hit rate of the true class and the adjacent classes and implement it in my cross validation.
#Example of my classification problem (in total 9 classes)
y_true = [0, 0, 1, 5, 3, 4]
y_pred = [0, 1, 0, 8, 6, 5]
The regular accuracy would result in 16,67 (the first prediction is the only one that's true). However, I would like to get the 'adjacent accuracy' which would be 66,67% in this case (the three first predictions are 'correct', together with the last one).
The formula would be like this: adjacent accuracy formula
where Pi stands for the total number of samples classified as class i, g is the total number of classes (= here 9), and n is the total number of samples.
I have already looked at this other question but it isn't particularly helpful since I would like to incorporate this scoring measure into a cross_validate function.
This is my current code:
scoringX = {'acc': 'accuracy',
'prec_macro': 'precision_macro',
'rec_macro': 'recall_macro',
'auc': 'roc_auc_ovr_weighted'}
cv_scores_rf = cross_validate(clf, X, y, cv=kcv, scoring = scoringX)
cv_predict_rf = cross_val_predict(clf, X, y, cv=kcv)
This is would I would ideally like to end up with
scoringX = {'acc': 'accuracy',
'prec_macro': 'precision_macro',
'rec_macro': 'recall_macro',
'auc': 'roc_auc_ovr_weighted',
'adjacent_auc': make_scorer(custom_adjacent_accuracy_score)}
cv_scores_rf = cross_validate(clf, X, y, cv=kcv, scoring = scoringX)
cv_predict_rf = cross_val_predict(clf, X, y, cv=kcv)
Thanks in advance!
Upvotes: 4
Views: 1192
Reputation: 383
I actually wrote a question on Cross Validated a few months ago on how to express adjacent accuracy mathematically, and after some thinking I answered it with a formula that's a bit simpler than the one you gave. (You'll unfortunately have to follow the link to see it; Stack Overflow doesn't support math typesetting.)
This formula can be implemented fairly easily if we convert y_true
and y_pred
into numpy arrays:
import numpy as np
y_true = np.array([0, 0, 1, 5, 3, 4])
y_pred = np.array([0, 1, 0, 8, 6, 5])
precise_accuracy = np.sum(y_pred == y_true) / len(y_pred)
adjacent_accuracy = np.sum(np.abs(y_pred - y_true) <= 1) / len(y_pred)
I included the simpler calculation for precise accuracy to help make the adjacent accuracy easier to understand by comparison:
In the precise accuracy, we simply count the number of predictions that equal the true value and normalise by the number of predictions. y_pred == y_true
is an array of True
and False
, and the summation simply counts the number of True
values.
In the adjacent accuracy, we instead count the number of predictions whose ‘class distance’ np.abs(y_pred - y_true)
to the true value is no more than one.
I think the function you want can be implemented like this:
def custom_adjacent_accuracy_score(y_true, y_pred):
y_true = np.array(y_true)
y_pred = np.array(y_pred)
return np.sum(np.abs(y_pred - y_true) <= 1) / len(y_pred)
Upvotes: 3