Jacob Zhu
Jacob Zhu

Reputation: 11

xgboost R package early_stop_rounds does not trigger

I am using xgb.train() in xgboost R package to fit a classification model. I am trying to figure out what's the best iteration to stop the tree. I set the early_stop_rounds=6 and by watching each iteration's metrics I can clearly see that the auc performance on validation data reach the max and then decrease. However the model does not stop and keep going until the specified nround is reached.

Question 1: Is it the best model (for given parameter) defined at iteration when validation performance start to decrease?

Question 2: Why does the model does not stop when auc on validation start to decrease?

Question 3: How does Maximize parameter=FALSE mean? What will make it stop if it is set to FALSE? does it have to be FALSE when early_stop_round is set?

Question 4: How does the model know which one is the validation data in the watchlist? I've seen people use test=,eval=, validation1= etc?

Thank you!

param<-list(
  objective="binary:logistic",
  booster="gbtree",
  eta=0.02, #Control the learning rate
  max.depth=3, #Maximum depth of the tree
  subsample=0.8, #subsample ratio of the training instance
  colsample_bytree=0.5 # subsample ratio of columns when constructing each     tree
)

watchlist<-list(train=mtrain,validation=mtest)

sgb_model<-xgb.train(params=param, # this is the modeling parameter set     above
                 data = mtrain,
                 scale_pos_weight=1,
                 max_delta_step=1,
                 missing=NA,
                 nthread=2,
                 nrounds = 500, #total run 1500 rounds
                 verbose=2,
                 early_stop_rounds=6, #if performance not improving for 6 rounds, model iteration stops
                 watchlist=watchlist,
                 maximize=FALSE,
                 eval.metric="auc" #Maximize AUC to evaluate model
                 #metric_name = 'validation-auc'
                 )

Upvotes: 1

Views: 415

Answers (1)

abhiieor
abhiieor

Reputation: 3554

  • Answer 1: No, not the best but good enough from bias-variance tradeoff point of view.
  • Answer 2: It works, may be there is some problem with your code. Would you please give share the progress output of train and test set AUCs at each boosting step to prove this? If you are 100% sure its not working then you can submit error ticket in XGBoost git project.
  • Answer 3: Maximize=FALSE is for custom optimization function (say custom merror type of thing). You always want to maximize/increase AUC so Maximize=TRUE is better for you.
  • Answer 4: Its mostly position based. Train part first. Next should go into validation/evaluation.

Upvotes: 1

Related Questions