Eric
Eric

Reputation: 41

How to define loss function with XGBoost to only give confident answers?

I'm writing an XGBClassifier model with a custom loss function for a specific purpose.

My Goal:

My dataset consists of data labeled in two classes: {-1, 1}. I want my model to output a prediction only when super confident about the class (I don't care if my model opts out of 99% of predictions). So, my approach is to let the model predict 0 (neutral) when not confident. Therefore:

Loss function I came up with:

loss = 0.02 + 0.06 * e^(-2.8 * y_pred * y_true)

When model predicts neutral (0), the loss is intentionally nonzero so that the model is incentivized to make {-1, 1} predictions from time to time. I plan to play around with the numbers to get the model working best.

Questions:

This is what I've attempted.

def custom_loss(y_pred, y_true):
  grad = -0.168 * np.exp(-2.8 * y_true)
  hess = [0] * y_pred.shape[0]
  return grad, hess

model = xgboost.XGBClassifier(
    learn_rate=0.1,
    max_depth=3,
    n_estimators=5000,
    subsample=0.4,
    colsample_bytree=0.4,
    objective=custom_loss,
    verbosity=1
)

model.fit(
    X_train, 
    y_train, 
    early_stopping_rounds=100, 
    eval_set=[(X_valid, y_valid)], 
    verbose=True
)

It produces no change in the validation set accuracy. Definitely something wrong with my loss function.

Upvotes: 1

Views: 461

Answers (1)

Baradrist
Baradrist

Reputation: 192

Instead of writing your own loss function (as nice of an idea as that may be), you could also use the XGBClassifier.predict_proba() function that is described here. This will provide you with an estimated "probability" (you should be careful to interpret it as a good estimate, as it is usually not well calibrated) that you can use to set the cutoff for yourself. This means you are in full control of the subsequent output step and you can freely set it to {-1,0,1} given the predicted probability, for example by choosing a threshold of 0.99 for a class {-1,1} to be predicted or otherwise output 0. This is a simple yet not very sophisticated solution to your problem, if you want it.

Upvotes: 0

Related Questions