Reputation: 286
I am trying to fit many xgboost models with different parameters (e.g. for parameter tuning). Running them in parallel is needed to reduce time. However, upon running the %dopar%
command I get the following error: Error in unserialize(socklist[[n]]) : error reading from connection
.
Below is a reproducible example. It has to do with xgboost, since any other calculation involving global variables works within the %dopar%
loop. Could someone point out what is missing/wrong with this approach?
#### Load packages
library(xgboost)
library(parallel)
library(foreach)
library(doParallel)
#### Data Sim
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = 10 + 2*X[,1] + 3*X[,2] + rnorm(n,0,1)
#### Init XGB
train = xgb.DMatrix(data = X[-((n-10):n),], label = y[-((n-10):n)])
test = xgb.DMatrix(data = X[(n-10):n,], label = y[(n-10):n])
watchlist = list(train = train, test = test)
#### Init parallel & run
numCores = detectCores()
cl = parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)
clusterEvalQ(cl, {
library(xgboost)
})
pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
xgb.train(data = train, watchlist = watchlist, max_depth=i, nrounds = 1000, early_stopping_rounds = 10)$best_score
# if xgb.train is replaced with anything else, e.g. 1+y, it works
}
stopCluster(cl)
Upvotes: 1
Views: 2464
Reputation: 3986
As noted in the comments by HenrikB xgb.DMatrix
objects can't be used in parallelization. To get around this we can make the object inside of foreach
:
#### Load packages
library(xgboost)
library(parallel)
library(foreach)
library(doParallel)
#> Loading required package: iterators
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
#### Init parallel & run
numCores = detectCores()
cl = parallel::makeCluster(numCores, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)
pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
# BRING CREATION OF XGB MATRIX INSIDE OF foreach
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist = list(dtrain = dtrain, dtest = dtest)
param <- list(max_depth = i, eta = 0.01, verbose = 0,
objective = "binary:logistic", eval_metric = "auc")
bst <- xgb.train(param, dtrain, nrounds = 100, watchlist, early_stopping_rounds = 10)
bst$best_score
}
stopCluster(cl)
pred
#> [[1]]
#> dtest-auc
#> 0.892138
#>
#> [[2]]
#> dtest-auc
#> 0.987974
#>
#> [[3]]
#> dtest-auc
#> 0.986255
#>
#> [[4]]
#> dtest-auc
#> 1
#> ...
Since xgboost.train
is already parellalized, it might be interesting to see the difference in speeds between when threads are used for xgboost
vs when used for the parallel running of tuning rounds.
To do this I wrapped in a function and benchmarked the different combinations:
tune_par <- function(xgbthread, doparthread) {
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
#### Init parallel & run
cl = parallel::makeCluster(doparthread, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)
clusterEvalQ(cl, {
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
})
pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist = list(dtrain = dtrain, dtest = dtest)
param <- list(max_depth = i, eta = 0.01, verbose = 0, nthread = xgbthread,
objective = "binary:logistic", eval_metric = "auc")
bst <- xgb.train(param, dtrain, nrounds = 100, watchlist, early_stopping_rounds = 10)
bst$best_score
}
stopCluster(cl)
pred
}
In my testing evaluation was faster when using more threads for xgboost and less for the parallel running of tuning rounds. What works best probably depends on system specs and the amount of data.
# 16 logical cores split between xgb threads and threads in dopar cluster:
microbenchmark::microbenchmark(
xgb16par1 = tune_par(xgbthread = 16, doparthread = 1),
xgb8par2 = tune_par(xgbthread = 8, doparthread = 2),
xgb4par4 = tune_par(xgbthread = 4,doparthread = 4),
xgb2par8 = tune_par(xgbthread = 2, doparthread = 8),
xgb1par16 = tune_par(xgbthread = 1,doparthread = 16),
times = 5
)
#> Unit: seconds
#> expr min lq mean median uq max neval cld
#> xgb16par1 2.295529 2.431110 2.500170 2.519277 2.527914 2.727021 5 a
#> xgb8par2 2.301189 2.308377 2.407767 2.363422 2.465446 2.600402 5 a
#> xgb4par4 2.632711 2.778304 2.875816 2.825471 2.849003 3.293593 5 b
#> xgb2par8 4.508485 4.682284 4.752776 4.810461 4.822566 4.940085 5 c
#> xgb1par16 8.493378 8.550609 8.679931 8.768008 8.779718 8.807943 5 d
Upvotes: 2