dturk
dturk

Reputation: 1

XGBoost Cross-Validation Error: Error in slice.xgb.DMatrix

I am trying to run the xgb.cv algorithm for a multiclass classification problem in R (via R Studio) and I keep getting the following error message:

• Error in slice.xgb.DMatrix(dall, unlist(folds[-k])) : std::bad_alloc

My data set includes a 4-category response variable and 12 explanatory variables. I have gone through the process of converting the response variable to a numeric data type, splitting my data into train and test groups (80/20), created a sparse matrix using 1-hot encoding, and then built my xgb.DMatrix using the following code:

TrainM <- sparse.model.matrix(Response_var ~ .-1, data = Train_data)
Train_Label <- Train_data[,1] 
Train_Matrix <- xgb.DMatrix(data = as.matrix(TrainM), label = Train_Label)

TestM <- sparse.model.matrix(Response_var ~ .-1, data = Test_data)
Test_Label <- Test_data[,1] 
Test_Matrix <- xgb.DMatrix(data = as.matrix(TestM), label = Test_Label)

I then set the model parameters as follows:

nc <- length(unique(Train_Label)) 
xgb_params <- list(objective = "multi:softprob", 
                   eta = 0.01,
                   gamma = 2,
                   eval_metric = 'AUC',
                   max_depth = 15,
                   subsample = 0.5,
                   colsample_bytree = 0.5,
                   num_class = nc,
                   min_child_weight = 2)

And then run the cross-validated model as:

CV_Model <- xgb.cv(params = xgb_params,
                   data = Train_Matrix,
                   nrounds = 1000,
                   nfold = 10,
                   stratified = TRUE,
                   print_every_n = 1,
                   early_stopping_rounds = 15,
                   maximize = FALSE,
                   prediction = TRUE) 

Everything runs fine until I kick-off the CV model, which errors out very quickly (just as the model is initializing).

Error in slice.xgb.DMatrix(dall, unlist(folds[-k])) : std::bad_alloc

I am running this on a Windows 10 workstation using R v4.1.1 and RStudio V1.4.1106. I should note that I have been running the "same" code for several weeks now with no issue, with the only difference being the evaluation metric = 'mlogloss' instead of 'AUC'. However, as soon as I switched to 'AUC' the issue began to occur.

Any help resolving this would be very much appreciated!

Upvotes: 0

Views: 726

Answers (1)

Jagge
Jagge

Reputation: 978

Hi Doug Turk and welcome to the site. This error has most likely something to do with lack of memory.

See for example here: https://en.cppreference.com/w/cpp/memory/new/bad_alloc

In windows you can verify this by opening the task manager while running the code, you should see the memory go up to 100%. Try to rerun the code with a subset of your data, to reduce memory requirements, to see if that fixes your problem

good luck.

Upvotes: 2

Related Questions