bmath123
bmath123

Reputation: 41

Python xgb.cv use multiple evalation metrics

I am currently trying to find the optimal parameters of a XGBoost model. After finding the optimal parameters I would like to evaluate the model with cross validation by using multiple customized evalation metrics.

Let's assume I would like to use the following two metrics: (I would like to use different metrics, but the first one was provided in the documentation and I just want to get to know how I can use tow metrics)

def rmsle(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
    ''' Root mean squared log error metric.'''
    y = dtrain.get_label()
    predt[predt < -1] = -1 + 1e-6
    elements = np.power(np.log1p(y) - np.log1p(predt), 2)
    return 'PyRMSLE', float(np.sqrt(np.sum(elements) / len(y)))

def rmsle2(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
    ''' Root mean squared log error metric.'''
    y = dtrain.get_label()
    predt[predt < -1] = -1 + 1e-6
    elements = np.power(np.log1p(y) - np.log1p(predt), 2)
    return 'PyRMSLE', float(2*np.sqrt(np.sum(elements) / len(y)))

Now I use the follwing line to compute the models:

cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], folds=cv,
    feval={rmsle,rmsle2}, early_stopping_rounds=early_stopping_rounds)

Unfortunately this is not working. If I only use on feval metric feval=rmsle, this works out.

I can use two 'standard metrics' like the RMSE or MAE:

 cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], folds=cv,
        metrics={'mae','rmse'}, early_stopping_rounds=early_stopping_rounds) 

Here no error appears, but when I want to use more custom metrics, I get an error.

It would be amazing if anyone could provide me some help here. Thank you very much.

Upvotes: 3

Views: 2481

Answers (2)

bmath123
bmath123

Reputation: 41

In the end I did it just with:

cross_validate(xgb1, X, y, scoring=scorer, cv=KFold(n_splits=cv_folds, random_state=seed, shuffle=True), verbose = 0)

and

scorer = {'MAE': make_scorer(MAE, greater_is_better=False),
         'MAPE': make_scorer(MAPE, greater_is_better=False),
         'MdAE': make_scorer(MdAE, greater_is_better=False),
         'MdAPE': make_scorer(MdAPE, greater_is_better=False),
         'In_10': make_scorer(In_10, greater_is_better=True),
         'In_20': make_scorer(In_20, greater_is_better=True)}

Upvotes: 1

Rafa
Rafa

Reputation: 684

According to the documentation, feval argument is an evaluation function that is used to score your model and there should be just one of those in your .cv method. However, you can evaluate your CV by using metrics as you already did, but according to this, it looks like your dict might be missing a key-value pairs. Try to define it as following:

 cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], folds=cv,
        metrics={'first_score': rmsle, 'second_score': rmsle2}, early_stopping_rounds=early_stopping_rounds) 

Upvotes: 1

Related Questions