Reputation: 445
I'm working on a classification problem where I have an unbalanced dataset and I am interested in having high precision.
Therefore, I would like to change the objective function for XGBoost to something that allows me to put more weight on precision. F_beta score seems to be doing just that, but I have a problem with it:
model_xgbm = XGBClassifier(objective=fbeta_score)
random_search = RandomizedSearchCV(model_xgbm, param_distributions=param_dist, n_iter=n_iter_search,
scoring='average_precision')
This works, but I didn't supply a beta (I'm not even sure how does it work since beta is n obligatory parameter...)
model_xgbm = XGBClassifier(objective=fbeta_score(beta=0.5))
random_search = RandomizedSearchCV(model_xgbm, param_distributions=param_dist,
n_iter=n_iter_search,
scoring='average_precision')
This simply doesn't work ("TypeError: fbeta_score() takes at least 3 arguments (1 given)"). However, I can't really supply it with the other 2 arguments here.
Is there a solution without copying or wrapping the function and pasting as a custom objective?
EDIT : I found a function that might be useful: make_param, but unfortunately I can't seem to get it to work:
model_xgbm = XGBClassifier(objective=make_scorer(fbeta_score, beta=0.5))
random_search = RandomizedSearchCV(model_xgbm, param_distributions=param_dist,
n_iter=n_iter_search,
scoring='precision')
But this does not work either: "TypeError: __call__() takes at least 4 arguments (3 given)" Note that I don't want to use it for model selection: I want it to be the objective function of my XGBoost estimator! Thus, the example at the bottom of the aforementioned link does not work for me.
EDIT2 : OK, so in fact the problem seems to be that XGBoost Classifier expects me to provide as the objective a function that returns a gradient and a hessian... does anyone know a wrapper that would do that for me?
Upvotes: 0
Views: 1932
Reputation: 966
Looking at this part of the comments
eval_metric : str, callable, optional
If a str, should be a built-in evaluation metric to use. See
doc/parameter.md. If callable, a custom evaluation metric. The call
signature is func(y_predicted, y_true) where y_true will be a
DMatrix object such that you may need to call the get_label
method. It must return a str, value pair where the str is a name
for the evaluation and value is the value of the evaluation
function. This objective is always minimized.
This is actually wrong as you require
func(y_true, y_predicted)
for passing a objective function.
It seems that if you wrap your f_beta_score
as follows
def f_beta_wrapper(y_true, y_pred):
beta = 0.5
# note need to call .get_label() on y_true if using DMAtrix
return fbeta_score(y_pred, y_true, beta)
and pass that in.
It flows through correctly bu it reaches the issue that you mentioned which is fbeta_score
returns a float
and not the two outputs it is expecting for which you can compute gradients from. More specifically
/usr/local/lib/python2.7/site-packages/xgboost/core.pyc in update(self, dtrain, iteration, fobj)
807 else:
808 pred = self.predict(dtrain)
809 grad, hess = fobj(pred, dtrain) # error here
810 self.boost(dtrain, grad, hess)
TypeError: 'numpy.float64' object is not iterable
This makes sense as the objective
function is being minimized so we require outputs that are akin to minimizing parameters i.e. gradients.
Upvotes: 1