Parallel processing with xgboost and caret

Question

I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to:

Register the number of cores (for example, with the doMC package and the registerDoMC function), set nthread=1 via caret's train function so it passes that parameter to xgboost, set allowParallel=TRUE in trainControl, and let caret handle the parallelization for the cross-validation; or
Disable caret parallelization (allowParallel=FALSE and no parallel back-end registration) and set nthread to the number of physical cores, so the parallelization is contained exclusively within xgboost.

Or is there no "better" way to perform the parallelization?

Edit: I ran the code suggested by @topepo, with tuneLength = 10 and search="random", and specifying nthread=1 on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got:

xgb_par[3]
elapsed  
283.691 
just_seq[3]
elapsed 
276.704 
mc_par[3]
elapsed 
89.074 
just_seq[3]/mc_par[3]
elapsed 
3.106451 
just_seq[3]/xgb_par[3]
elapsed 
0.9753711 
xgb_par[3]/mc_par[3]
elapsed 
3.184891

At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.

topepo · Accepted Answer

It is not simple to project what the best strategy would be. My (biased) thought is that you should parallelize the process that takes the longest. Here, that would be the resampling loop since an open thread/worker would invoke the model many times. The opposite approach of parallelizing the model fit will start and stop workers repeatedly and theoretically slows things down. Your mileage may vary.

I don't have OpenMP installed but there is code below to test (if you could report your results, that would be helpful).

library(caret)
library(plyr)
library(xgboost)
library(doMC)

foo <- function(...) {
  set.seed(2)
  mod <- train(Class ~ ., data = dat, 
               method = "xgbTree", tuneLength = 50,
               ..., trControl = trainControl(search = "random"))
  invisible(mod)
}

set.seed(1)
dat <- twoClassSim(1000)

just_seq <- system.time(foo())


## I don't have OpenMP installed
xgb_par <- system.time(foo(nthread = 5))

registerDoMC(cores=5)
mc_par <- system.time(foo())

My results (without OpenMP)

> just_seq[3]
elapsed 
326.422 
> xgb_par[3]
elapsed 
319.862 
> mc_par[3]
elapsed 
102.329 
> 
> ## Speedups
> xgb_par[3]/mc_par[3]
elapsed 
3.12582 
> just_seq[3]/mc_par[3]
 elapsed 
3.189927 
> just_seq[3]/xgb_par[3]
 elapsed 
1.020509

Parallel processing with xgboost and caret

Answers (1)

Related Questions