Reputation: 5263
I'm trying to build a model and the response variable has three options: [0,1,2] and here is the code:
def model_jobs(X_train, X_val, y_train, y_val):
param = {'eta': 0.3,
'n_estimators': 600,
'gamma': 2.0,
'max_depth': 3,
'min_child_weight': 1.0,
'subsample': 0.8,
'max_delta_step': 0.0,
'colsample_bytree': 1.0,
'lambda': 1.0,
'alpha': 1.0,
'num_class': 3,
'eval_metric': "aucpr",
'objective': "multi:softprob",
'num_boost_round': 20,
'early_stopping_rounds': 50, }
model_fitting(X_train = X_train, y_train = y_train, X_val = X_val, y_val = y_val, tag="first", param=param)
def model_fitting(X_train, X_val, y_train, y_val, tag, param):
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
watchlist = [(dtrain, "train"), (dval, "eval")]
bst = xgb.train(
dtrain=dtrain,
evals=watchlist,
params=param
)
and is returning this error:
Check failed: preds.Size() == info.labels_.Size() (11122134 vs. 3707378) : label size predict size not match
I checked all the sizes and they're completely fine. The actual size of the table is 3707378 and with simple math, we see 11122134 == 3707378 * 3 and what is puzzling me that if I change num_class in the param from 3 to 4, I will get this error:
Check failed: preds.Size() == info.labels_.Size() (14829512 vs. 3707378) : label size predict size not match
and 14829512 == 3707378 * 4 What am I doing wrong? What is the relationship between num_class and this error? My xgboost version is 1.1.1
Upvotes: 4
Views: 2434
Reputation: 5263
The problem is eval_metric
. Apparently multi:softprob
is not happy with aucpr
we need to change it to mlogloss
. I'm not sure why it is happening.
param = {'eta': 0.3,
'n_estimators': 600,
'gamma': 2.0,
'max_depth': 3,
'min_child_weight': 1.0,
'subsample': 0.8,
'max_delta_step': 0.0,
'colsample_bytree': 1.0,
'lambda': 1.0,
'alpha': 1.0,
'num_class': 3,
'eval_metric': "mlogloss",
'objective': "multi:softprob",
'num_boost_round': 20,
'early_stopping_rounds': 50, }
Upvotes: 4