Reputation: 83
I have a special binary classification use case, where, depending on the model's decisions, the next data it is evaluated on changes.
An example:
[[x0,y0], [x1,y1], [x2,y2]...]
If the model predicts 1 for x0, then the next point it is evaluated on is [x1,y1].
If the model predicts 0 for x0, then the next point it is evaluated on is [x2,y2].
At first I thought I would just train the model on all the points, and that it would just be good in the final scenario where evaluation depends on previous prediction, but it isn't.
I developed a function that computes the evaluation function with the interdependent fashion I explained, and see that it doesn't improve when the loss function on the entire set of points improves. It doesn't improve even though I train and validate on the same set of data.
So I would like to modify the loss function such that it uses the model to select the subset of training points that it should use for computing the loss, depending on the current state of the model (at each boosting step).
I believe this should be possible in theory. My questions are:
Do you also think it is possible? If it is possible, how can I pass the model to the custom loss?
Thank you in advance!
Upvotes: 0
Views: 53
Reputation: 11
You can achieve that by doing incremental training:
# To run at each new increment:
model = xgboost.train(
params,
df_train = f(df, model), # f to be implemented with the logic you described
num_boost_round=1, xgb_model=model, # Will add 1 new tree to model
)
Upvotes: 1