JohnnyQ
JohnnyQ

Reputation: 445

Using evaluation functions with parameter in XGBoost (f_beta)

I'm working on a classification problem where I have an unbalanced dataset and I am interested in having high precision.

Therefore, I would like to change the objective function for XGBoost to something that allows me to put more weight on precision. F_beta score seems to be doing just that, but I have a problem with it:

model_xgbm = XGBClassifier(objective=fbeta_score)
random_search = RandomizedSearchCV(model_xgbm, param_distributions=param_dist, n_iter=n_iter_search,
                               scoring='average_precision')

This works, but I didn't supply a beta (I'm not even sure how does it work since beta is n obligatory parameter...)

model_xgbm = XGBClassifier(objective=fbeta_score(beta=0.5))
random_search = RandomizedSearchCV(model_xgbm, param_distributions=param_dist,
                                   n_iter=n_iter_search,
                                   scoring='average_precision')

This simply doesn't work ("TypeError: fbeta_score() takes at least 3 arguments (1 given)"). However, I can't really supply it with the other 2 arguments here.

Is there a solution without copying or wrapping the function and pasting as a custom objective?

EDIT : I found a function that might be useful: make_param, but unfortunately I can't seem to get it to work:

model_xgbm = XGBClassifier(objective=make_scorer(fbeta_score, beta=0.5))
random_search = RandomizedSearchCV(model_xgbm, param_distributions=param_dist,
                                   n_iter=n_iter_search,
                                   scoring='precision')

But this does not work either: "TypeError: __call__() takes at least 4 arguments (3 given)" Note that I don't want to use it for model selection: I want it to be the objective function of my XGBoost estimator! Thus, the example at the bottom of the aforementioned link does not work for me.

EDIT2 : OK, so in fact the problem seems to be that XGBoost Classifier expects me to provide as the objective a function that returns a gradient and a hessian... does anyone know a wrapper that would do that for me?

Upvotes: 0

Views: 1932

Answers (1)

Chinny84
Chinny84

Reputation: 966

Looking at this part of the comments

eval_metric : str, callable, optional
        If a str, should be a built-in evaluation metric to use. See
        doc/parameter.md. If callable, a custom evaluation metric. The call
        signature is func(y_predicted, y_true) where y_true will be a
        DMatrix object such that you may need to call the get_label
        method. It must return a str, value pair where the str is a name
        for the evaluation and value is the value of the evaluation
        function. This objective is always minimized.

This is actually wrong as you require

func(y_true, y_predicted)

for passing a objective function.

It seems that if you wrap your f_beta_score as follows

def f_beta_wrapper(y_true, y_pred):
    beta = 0.5
    # note need to call .get_label() on y_true if using DMAtrix
    return fbeta_score(y_pred, y_true, beta)

and pass that in.

It flows through correctly bu it reaches the issue that you mentioned which is fbeta_score returns a float and not the two outputs it is expecting for which you can compute gradients from. More specifically

/usr/local/lib/python2.7/site-packages/xgboost/core.pyc in update(self, dtrain, iteration, fobj)

807 else:

808 pred = self.predict(dtrain)

809 grad, hess = fobj(pred, dtrain) # error here

810 self.boost(dtrain, grad, hess)

TypeError: 'numpy.float64' object is not iterable

This makes sense as the objective function is being minimized so we require outputs that are akin to minimizing parameters i.e. gradients.

Upvotes: 1

Related Questions