ADJ
ADJ

Reputation: 5282

How do I get CatBoost get_object_importance to work with AUC?

I replicated the example here.

The example tries to improve RMSE (lower->better).

My limited understanding is that CatBoost will try to minimize LogLoss under the hood. In this example lower LogLoss seems to correlate to lower RMSE.

RMSE on validation datset when 0 harmful objects from train are dropped: 0.25915746122622113
RMSE on validation datset when 250 harmful objects from train are dropped: 0.25601149050939825
RMSE on validation datset when 500 harmful objects from train are dropped: 0.25158044983631966
RMSE on validation datset when 750 harmful objects from train are dropped: 0.24570533776587475
RMSE on validation datset when 1000 harmful objects from train are dropped: 0.24171376432589384
RMSE on validation datset when 1250 harmful objects from train are dropped: 0.23716221792112202
RMSE on validation datset when 1500 harmful objects from train are dropped: 0.23352830055657348
RMSE on validation datset when 1750 harmful objects from train are dropped: 0.23035731488436903
RMSE on validation datset when 2000 harmful objects from train are dropped: 0.2275943109556251

Besides observing RMSE with cb.eval_metrics(validation_pool, ['RMSE'])['RMSE'][-1], the example doesn't really use RMSE as a custom loss function.

cb = CatBoost({'iterations': 100, 'verbose': False, 'random_seed': 42})
print(cb.eval_metrics(validation_pool, ['RMSE'])['RMSE'][-1])

In my case I have a binary classification problem and I want to maximize AUC. I'm not sure if I should just leave the code as is, and hope that lower logloss correlates to higher AUC (it doesn't), or if I need to set this up differently, perhaps using AUC as a custom loss/eval_metric function and then flipping importance_values_sign from 'Positive' to 'Negative'.

Upvotes: 0

Views: 692

Answers (1)

nikitxskv
nikitxskv

Reputation: 11

In case of loss_function='RMSE', CatBoost try to minimize RMSE loss function, not Logloss. RMSE is the default CatBoost loss function.

CatBoost evaluates Logloss using formula from this page. Therefore, lower Logloss correlates to higher AUC.

So, you just need to replace

cb = CatBoost({'iterations': 100, 'verbose': False, 'random_seed': 42})

with

cb = CatBoost({'loss_function': 'Logloss', 'iterations': 100, 'verbose': False, 'random_seed': 42})

And observe not for RMSE, but for AUC.

Upvotes: 1

Related Questions