Which metric to use for imbalanced classification problem?

Question

I am working on a classification problem with very imbalanced classes. I have 3 classes in my dataset : class 0,1 and 2. Class 0 is 11% of the training set, class 1 is 13% and class 2 is 75%.

I used and random forest classifier and got 76% accuracy. But I discovered 93% of this accuracy comes from class 2 (majority class). Here is the Crosstable I got.

The results I would like to have :

fewer false negatives for class 0 and 1 OR/AND fewer false positives for class 0 and 1

What I found on the internet to solve the problem and what I've tried :

using class_weight='balanced' or customized class_weight ( 1/11% for class 0, 1/13% for class 1, 1/75% for class 2), but it doesn't change anything (the accuracy and crosstable are still the same). Do you have an interpretation/explenation of this ?
as I know accuracy is not the best metric in this context, I used other metrics : precision_macro, precision_weighted, f1_macro and f1_weighted, and I implemented the area under the curve of precision vs recall for each class and use the average as a metric.

Here's my code (feedback welcome) :

from sklearn.preprocessing import label_binarize

def pr_auc_score(y_true, y_pred):
    y=label_binarize(y_true, classes=[0, 1, 2])
    return average_precision_score(y[:,:],y_pred[:,:])

pr_auc = make_scorer(pr_auc_score, greater_is_better=True,needs_proba=True)

and here's a plot of the precision vs recall curves.

Alas, for all these metrics, the crosstab remains the same... they seem to have no effect

I also tuned the parameters of Boosting algorithms ( XGBoost and AdaBoost) (with accuracy as metric) and again the results are not improved.. I don't understand because boosting algorithms are supposed to handle imbalanced data
Finally, I used another model (BalancedRandomForestClassifier) and the metric I used is accuracy. The results are good as we can see in this crosstab. I am happy to have such results but I notice that, when I change the metric for this model, there is again no change in the results...

So I'm really interested in knowing why using class_weight, changing the metric or using boosting algorithms, don't lead to better results...

Which metric to use for imbalanced classification problem?

Answers (1)

Related Questions