Reputation: 5282
I replicated the example here.
The example tries to improve RMSE (lower->better).
My limited understanding is that CatBoost will try to minimize LogLoss under the hood. In this example lower LogLoss seems to correlate to lower RMSE.
RMSE on validation datset when 0 harmful objects from train are dropped: 0.25915746122622113
RMSE on validation datset when 250 harmful objects from train are dropped: 0.25601149050939825
RMSE on validation datset when 500 harmful objects from train are dropped: 0.25158044983631966
RMSE on validation datset when 750 harmful objects from train are dropped: 0.24570533776587475
RMSE on validation datset when 1000 harmful objects from train are dropped: 0.24171376432589384
RMSE on validation datset when 1250 harmful objects from train are dropped: 0.23716221792112202
RMSE on validation datset when 1500 harmful objects from train are dropped: 0.23352830055657348
RMSE on validation datset when 1750 harmful objects from train are dropped: 0.23035731488436903
RMSE on validation datset when 2000 harmful objects from train are dropped: 0.2275943109556251
Besides observing RMSE with cb.eval_metrics(validation_pool, ['RMSE'])['RMSE'][-1]
, the example doesn't really use RMSE as a custom loss function.
cb = CatBoost({'iterations': 100, 'verbose': False, 'random_seed': 42})
print(cb.eval_metrics(validation_pool, ['RMSE'])['RMSE'][-1])
In my case I have a binary classification problem and I want to maximize AUC.
I'm not sure if I should just leave the code as is, and hope that lower logloss correlates to higher AUC (it doesn't), or if I need to set this up differently, perhaps using AUC as a custom loss/eval_metric function and then flipping importance_values_sign
from 'Positive' to 'Negative'.
Upvotes: 0
Views: 692
Reputation: 11
In case of loss_function='RMSE'
, CatBoost try to minimize RMSE
loss function, not Logloss
. RMSE
is the default CatBoost loss function.
CatBoost evaluates Logloss
using formula from this page. Therefore, lower Logloss
correlates to higher AUC
.
So, you just need to replace
cb = CatBoost({'iterations': 100, 'verbose': False, 'random_seed': 42})
with
cb = CatBoost({'loss_function': 'Logloss', 'iterations': 100, 'verbose': False, 'random_seed': 42})
And observe not for RMSE
, but for AUC
.
Upvotes: 1