Reputation: 522
For testing purposes, inside my R package I placed the following function:
parsetup <- function(){
cl <- parallel::makeCluster(12,type='PSOCK')
parallel::clusterCall(cl,function() 1+1)
}
When I run mypkg::parsetup()
, it takes ~ 6s to complete.
When I run parsetup2 <- mypkg:parsetup(); parsetup2()
in the global environment, it takes ~ 6s to complete.
When I run the code definining the parsetup function in the global environment, and then run parsetup()
, it takes ~ 0.3s
This seems rather silly to me, can anyone explain why and or suggest a workaround? Adding 6s to every function where I want to use parallelisation is pretty frustrating.
edit: Difference in time occurs during the clusterCall, number of cluster nodes created is 12 in each case.
sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ctsem_3.4.3 testthat_3.0.2 profvis_0.3.7 Rcpp_1.0.6
Upvotes: 4
Views: 716
Reputation: 5049
I think there are a couple of things at play here. The first one is environments and their parents. Depending on the complexity of the function, a lot of data may have to be sent to the parallel processes. Take for instance this code that creates some closures:
cl <- parallel::makeCluster(2L)
fun_factory <- function(x) { function() { list(x = x, addr = lobstr::obj_addr(x)) } }
fun <- fun_factory(0)
lobstr::obj_addr(environment(fun)$x)
# [1] "0x557ad9605068"
str(parallel::clusterCall(cl, fun))
# List of 2
# $ :List of 2
# ..$ x : num 0
# ..$ addr: chr "0x564b16e37e68"
# $ :List of 2
# ..$ x : num 0
# ..$ addr: chr "0x55a5a9437e68"
As you can see, the function is "contained" in an environment that holds x
,
and when the function is sent to the workers for evaluation,
that environment needs to be "replicated",
which results in copies of x
with different memory addresses.
You don't see the exact same behavior with clusterEvalQ
because that function only serializes the code expression
(see also ?base::quote
),
so it won't work directly in this example:
#parallel::clusterExport(cl, "fun") # uncomment this to make it work
parallel::clusterEvalQ(cl, fun())
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: could not find function "fun"
Functions defined inside packages have a few environments "surrounding" them.
I doubt everything gets serialized for evaluation in the workers,
but I'm not surprised it isn't negligible.
Additionally, when you execute functions that require other packages in the workers,
each worker must load the package for execution,
so you don't want to recreate cl
all the time.
Some overhead is unavoidable,
but to avoid recreating the "cluster" all the time,
you cloud pass the responsibility of its management to the user,
which for you as a package developer might simplify a few things since you don't have to worry about calling parallel::stopCluster
.
I personally like the abstractions in foreach
.
In your package you define functions like:
my_par_fun <- function(x) {
foreach::foreach(x = x) %dopar% {
x + 1
}
}
And the code will execute sequentially if there's no parallel backend registered.
If the user wants parallelization,
they could install backend packages like doParallel
and call something like
cl <- parallel::makeCluster(2L)
doParallel::registerDoParallel(cl)
before calling your package's functions.
Once they're done, they can call parallel::stopCluster
and foreach::registerDoSEQ
,
and it all remains transparent to your package code.
And just FYI, when using foreach
,
you don't need to consume data,
you could do something like:
my_par_fun <- function(...) {
foreach::foreach(i = 1L:foreach::getDoParWorkers()) %dopar% {
# a very time-consuming task
}
}
That way each worker gets a task, regardless of how many were created by the user.
Upvotes: 2