How would I construct my own evaluation metric for minimizing test error for my highly unbalanced class using XGBoost?

Question

I have collected data on how long it takes for a product to be released in a release pipeline. 95% of the data so far has taken <400 minutes [outlier = 0]. Then 5% of the data is between [700,40 000] minutes [outlier = 1]. I want to build a classifier using xgboost which predicts if an event will be an "outlier" or not. The thing is, it is very uncommon with outliers and I have about 200 datapoints which are outliers and 3200 datapoints which are not.

Currently, without tuning, my model can predict 98% of [outlier = 0] cases and 67% of [outlier = 1]. It is important for me that the model does not perform worse on detecting [outlier = 0] since 95% of the data is in this set, but I want to see if I still can tune the model to increase performance on detecting [outlier = 1].

So I have two variables :

      ratio_wrong_0 = len(predicted_wrong_0) / len(true_0)
      ratio_wrong_1 = len(predicted_wrong_1) / len(true_1)

So I want to keep ratio_wrong_0 below 5% and minimize ratio_wrong_1 at the same time. Anyone have any idea how I could construct such a metric for evaluation during tuning my parameters?

How would I construct my own evaluation metric for minimizing test error for my highly unbalanced class using XGBoost?

Answers (1)

Related Questions