MTT
MTT

Reputation: 5263

xgboost issue with multi:softprob -- label size predict size not match

I'm trying to build a model and the response variable has three options: [0,1,2] and here is the code:

def model_jobs(X_train, X_val, y_train, y_val):
    param = {'eta': 0.3,
         'n_estimators': 600,
         'gamma': 2.0,
         'max_depth': 3,
         'min_child_weight': 1.0,
         'subsample': 0.8,
         'max_delta_step': 0.0,
         'colsample_bytree': 1.0,
         'lambda': 1.0,
         'alpha': 1.0,
         'num_class': 3,
         'eval_metric': "aucpr",
         'objective': "multi:softprob",
         'num_boost_round': 20,
         'early_stopping_rounds': 50, }
    model_fitting(X_train = X_train, y_train = y_train, X_val = X_val, y_val = y_val, tag="first", param=param)

def model_fitting(X_train, X_val, y_train, y_val, tag, param):
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    watchlist = [(dtrain, "train"), (dval, "eval")]
    bst = xgb.train(
        dtrain=dtrain,
        evals=watchlist,
        params=param
    )

and is returning this error:

Check failed: preds.Size() == info.labels_.Size() (11122134 vs. 3707378) : label size predict size not match

I checked all the sizes and they're completely fine. The actual size of the table is 3707378 and with simple math, we see 11122134 == 3707378 * 3 and what is puzzling me that if I change num_class in the param from 3 to 4, I will get this error:

Check failed: preds.Size() == info.labels_.Size() (14829512 vs. 3707378) : label size predict size not match

and 14829512 == 3707378 * 4 What am I doing wrong? What is the relationship between num_class and this error? My xgboost version is 1.1.1

Upvotes: 4

Views: 2434

Answers (1)

MTT
MTT

Reputation: 5263

The problem is eval_metric. Apparently multi:softprob is not happy with aucpr we need to change it to mlogloss. I'm not sure why it is happening.

param = {'eta': 0.3,
     'n_estimators': 600,
     'gamma': 2.0,
     'max_depth': 3,
     'min_child_weight': 1.0,
     'subsample': 0.8,
     'max_delta_step': 0.0,
     'colsample_bytree': 1.0,
     'lambda': 1.0,
     'alpha': 1.0,
     'num_class': 3,
     'eval_metric': "mlogloss",
     'objective': "multi:softprob",
     'num_boost_round': 20,
     'early_stopping_rounds': 50, }

Upvotes: 4

Related Questions