N08
N08

Reputation: 1315

Early stopping with GridSearchCV - use hold-out set of CV for validation

I want to employ the early-stopping-option in scikit-learns GridSearchCV-method. An example of this is shown in this SO-thread:

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

trainX= [[1], [2], [3], [4], [5]]
trainY = [1, 2, 3, 4, 5]

testX = trainX 
testY = trainY

param_grid = {"subsample" : [0.5, 0.8],
              "n_estimators" : [600]}

fit_params = {"early_stopping_rounds":1,
             "eval_set" : [[testX, testY]]}


model = xgb.XGBRegressor()
gridsearch = GridSearchCV(estimator  = xgb.XGBRegressor(), 
                          param_grid=param_grid,
                          fit_params=fit_params,                          
                          verbose=1,                          
                          cv=2)
gridsearch.fit(trainX,trainY)

However, I would like to use as validation set the hold-out set of the cross-validation process. Is there a way to specify this in GridSearchCV?

Upvotes: 5

Views: 8886

Answers (2)

Eran Moshe
Eran Moshe

Reputation: 3208

Back in the days I've built a class, wrapping the package "HyperOpt" to suit my needs.

I'll try to quickly minimize it for you, so you can use it. Here's the code with some notes in the end, to help you solve your problem:

import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
import xgboost as xgb
max_float_digits = 4


def rounded(val):
    return '{:.{}f}'.format(val, max_float_digits)


class HyperOptTuner(object):
    """
    Tune my parameters!
    """
    def __init__(self, dtrain, dvalid, early_stopping=200, max_evals=200):
        self.counter = 0
        self.dtrain = dtrain
        self.dvalid = dvalid
        self.early_stopping = early_stopping
        self.max_evals = max_evals
        self.tuned_params = None


    def score(self, params):
        self.counter += 1
        # Edit params
        print("Iteration {}/{}".format(self.counter, self.max_evals))
        num_round = int(params['n_estimators'])
        del params['n_estimators']

        watchlist = [(self.dtrain, 'train'), (self.dvalid, 'eval')]
        model = xgb.train(params, self.dtrain, num_round, evals=watchlist, early_stopping_rounds=self.early_stopping,
                          verbose_eval=False)
        n_epoach = model.best_ntree_limit
        score = model.best_score
        params['n_estimators'] = n_epoach
        params = dict([(key, rounded(params[key]))
                       if type(params[key]) == float
                       else (key, params[key])
                       for key in params])

        print "Trained with: "
        print params
        print "\tScore {0}\n".format(score)
        return {'loss': 1 - score, 'status': STATUS_OK, 'params': params}

    def optimize(self, trials):
        space = {
            'n_estimators': 2000,  # hp.quniform('n_estimators', 10, 1000, 10),
            'eta': hp.quniform('eta', 0.025, 0.3, 0.025),
            'max_depth': hp.choice('max_depth', np.arange(1, 9, dtype=int)),
            'min_child_weight': hp.choice('min_child_weight', np.arange(1, 10, dtype=int)),
            'subsample': hp.quniform('subsample', 0.3, 1, 0.05),
            'gamma': hp.quniform('gamma', 0.1, 20, 0.1),
            'colsample_bytree': hp.quniform('colsample_bytree', 0.5, 1, 0.25),
            'eval_metric': 'map',
            'objective': 'rank:pairwise',
            'silent': 1
        }

        fmin(self.score, space, algo=tpe.suggest, trials=trials, max_evals=self.max_evals),

        min_loss = 1
        min_params = {}
        for trial in trials.trials:
            tmp_loss, tmp_params = trial['result']['loss'], trial['result']['params']
            if tmp_loss < min_loss:
                min_loss, min_params = tmp_loss, tmp_params

        print("Winning params:")
        print(min_params)
        print "\tScore: {}".format(1-min_loss)
        self.tuned_params = min_params

    def tune(self):
        print "Tuning...\n"
        # Trials object where the history of search will be stored
        trials = Trials()
        self.optimize(trials)

So I've used a class, mainly to define parameters and save results for farther usage. There are 2 mains functions.

  1. optimize() created to define our "searching space", calculate the best parameters that minimizes the error (so do note that you are MINIMIZING an error) and saving the best parameters it has found. Also added some prints to help you follow the process.

  2. score() created to calculate the score of a model using certain HyperParams from the "searching space". It's using early_stopping as defined in the Class. Since I didn't need to use cross validation I've used xgb.train(), but you can change it to xgb.cv() which does support early_stopping_rounds. Also added prints there to help you follow the process. score returns 1 - score (because I've calculated MAP which is an evaluation needed to be increased, so if you calculating an error like RMSE, just return score as is.)

This is how you activate it from your code, after having a dtrain and dtest matrices:

# dtrain is a training set of type DMatrix
# dtest is a testing set of type DMatrix
tuner = HyperOptTuner(dtrain=dtrain, dvalid=dtest, early_stopping=200, max_evals=400)
tuner.tune()

Where max_evals is the size of the "search grid"

Follow these guidelines and let me know if you're having trouble.

Upvotes: 1

00__00__00
00__00__00

Reputation: 5327

This is not possible with the present implementation of xgboost (referring to versions 0.6 and 0.7). Please be careful to the difference between native xgboost

    xgboost.train(params, dtrain, num_boost_round=10, evals=(), obj=None, 
feval=None, maximize=False, early_stopping_rounds=None, evals_result=None, 
verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None)

or

xgboost.cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False, 
folds=None, metrics=(), obj=None, feval=None, maximize=False, 
early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None, 
show_stdv=True, seed=0, callbacks=None, shuffle=True)

and the sklearn interface:

    class xgboost.XGBRegressor(max_depth=3, learning_rate=0.1, 
n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', 
n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, 
subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, 
reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, 
missing=None, **kwargs)

as you can see there is not such a thing as early stop for xgboost.XGBRegressor. Notice that the sklearn interface is the only one you can use in combination with GridSearchCV which requires a proper sklearn estimator with .fit(), .predict() etc...

You could pass you early_stopping_rounds, and eval_set as an extra fit_params to GridSearchCV, and that would actually work. However, GridSearchCV will not change the fit_params between the different folds, so you would end up using the same eval_set in all the folds, which might not be what you mean by CV.

model=xgb.XGBClassifier()
clf = GridSearchCV(model, parameters,
                         fit_params={'early_stopping_rounds':20,\
                         'eval_set':[(X,y)]},cv=kfold)  

After some tweaking, I found the safest way to integrate early_stopping_rounds and the sklearn API is to implement an early_stopping mechanism your self. You can do it if you do a GridSearchCV with n_rounds as paramter to be tuned. You can then watch the mean_validation_score for the different models with increasing n_rounds. Then you can define a custom heuristic for early stop; you will notice that the default one is not optimal so to say.

I think it is also a better approach then using a single split hold-out for this purpose.

Upvotes: 2

Related Questions