Reputation: 11995
I am trying to create an R function that does a parallelized bootstrapping routine, but am having difficulties passing the the function arguments within parLapply. Below is a (hopefully) reproducible example, where the cluster is unable to find the values for the arguments:
innerFun <- function(a=rnorm(10), q=0.5){
quantile(a, probs = q)
}
library(parallel)
bootFun <- function(a=rnorm(10), q=0.5, nperm=10, no_cores = detectCores() - 1){
parFun <- function(x){
set.seed(x)
ai <- sample(a, size=length(a), replace = TRUE)
return(innerFun(a=ai, q=q))
}
ARGS <- list("innerFun", "a", "q", "nperm")
cl <- parallel::makeCluster(no_cores, type="PSOCK")
nn <- split(1:nperm, 1:nperm)
parallel::clusterExport(cl, varlist = ARGS)
res <- parallel::parLapply(cl, nn, parFun)
parallel::stopCluster(cl)
res <- do.call("rbind", res)
return(res)
}
set.seed(1)
res1 <- bootFun(a=rnorm(100), q=0.5, nperm=10, no_cores = detectCores() - 1)
# Error in get(name, envir = envir) : object 'a' not found
Upvotes: 2
Views: 180
Reputation: 13581
This is one of the trickier aspects of parallel::clusterExport
. As it says in the docs,
clusterExport assigns the values on the master R process of the variables named in varlist to variables of the same names in the global environment (aka ‘workspace’) of each node
That is, it looks in the global environment for the variable names. The default environment
argument also demonstrates this
clusterExport(cl = NULL, varlist, envir = .GlobalEnv)
You need to specify the environment to the function (non-global) environment like so
clusterExport(cl, args, env = environment())
In your case, update to
parallel::clusterExport(cl, varlist = ARGS, env = environment())
Replacing with the updated version, this leads to the output for res1
50%
1 0.11379733
2 -0.01619026
3 0.05117174
4 -0.11234621
5 0.37001881
6 0.07445315
7 0.01455376
8 -0.03924000
9 0.01481569
10 0.18364332
Upvotes: 2