Reputation: 11
I am using xgb.train() in xgboost R package to fit a classification model. I am trying to figure out what's the best iteration to stop the tree. I set the early_stop_rounds=6 and by watching each iteration's metrics I can clearly see that the auc performance on validation data reach the max and then decrease. However the model does not stop and keep going until the specified nround is reached.
Question 1: Is it the best model (for given parameter) defined at iteration when validation performance start to decrease?
Question 2: Why does the model does not stop when auc on validation start to decrease?
Question 3: How does Maximize parameter=FALSE mean? What will make it stop if it is set to FALSE? does it have to be FALSE when early_stop_round is set?
Question 4: How does the model know which one is the validation data in the watchlist? I've seen people use test=,eval=, validation1= etc?
Thank you!
param<-list(
objective="binary:logistic",
booster="gbtree",
eta=0.02, #Control the learning rate
max.depth=3, #Maximum depth of the tree
subsample=0.8, #subsample ratio of the training instance
colsample_bytree=0.5 # subsample ratio of columns when constructing each tree
)
watchlist<-list(train=mtrain,validation=mtest)
sgb_model<-xgb.train(params=param, # this is the modeling parameter set above
data = mtrain,
scale_pos_weight=1,
max_delta_step=1,
missing=NA,
nthread=2,
nrounds = 500, #total run 1500 rounds
verbose=2,
early_stop_rounds=6, #if performance not improving for 6 rounds, model iteration stops
watchlist=watchlist,
maximize=FALSE,
eval.metric="auc" #Maximize AUC to evaluate model
#metric_name = 'validation-auc'
)
Upvotes: 1
Views: 415
Reputation: 3554
Maximize=FALSE
is for custom optimization function (say custom merror
type of thing). You always want to maximize/increase AUC so Maximize=TRUE
is better for you.Upvotes: 1