xgboost R package early_stop_rounds does not trigger

Question

I am using xgb.train() in xgboost R package to fit a classification model. I am trying to figure out what's the best iteration to stop the tree. I set the early_stop_rounds=6 and by watching each iteration's metrics I can clearly see that the auc performance on validation data reach the max and then decrease. However the model does not stop and keep going until the specified nround is reached.

Question 1: Is it the best model (for given parameter) defined at iteration when validation performance start to decrease?

Question 2: Why does the model does not stop when auc on validation start to decrease?

Question 3: How does Maximize parameter=FALSE mean? What will make it stop if it is set to FALSE? does it have to be FALSE when early_stop_round is set?

Question 4: How does the model know which one is the validation data in the watchlist? I've seen people use test=,eval=, validation1= etc?

Thank you!

param<-list(
  objective="binary:logistic",
  booster="gbtree",
  eta=0.02, #Control the learning rate
  max.depth=3, #Maximum depth of the tree
  subsample=0.8, #subsample ratio of the training instance
  colsample_bytree=0.5 # subsample ratio of columns when constructing each     tree
)

watchlist<-list(train=mtrain,validation=mtest)

sgb_model<-xgb.train(params=param, # this is the modeling parameter set     above
                 data = mtrain,
                 scale_pos_weight=1,
                 max_delta_step=1,
                 missing=NA,
                 nthread=2,
                 nrounds = 500, #total run 1500 rounds
                 verbose=2,
                 early_stop_rounds=6, #if performance not improving for 6 rounds, model iteration stops
                 watchlist=watchlist,
                 maximize=FALSE,
                 eval.metric="auc" #Maximize AUC to evaluate model
                 #metric_name = 'validation-auc'
                 )

xgboost R package early_stop_rounds does not trigger

Answers (1)

Related Questions