Reputation: 1289
I am having difficulties understanding how variables are scoped/passed to the functions when interacting with the parallel package
library(parallel)
test <- function(a = 1){
no_cores <- detectCores()-1
clust <- makeCluster(no_cores)
result <- parSapply(clust, 1:10, function(x){a + x})
stopCluster(clust)
return(result)
}
test()
[1] 4 5 6 7 8 9 10 11 12 13
x = 1
test(x)
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: object 'x' not found
test() works but test(x) doesn't. When I modify the function as follows, it works.
test <- function(a = 1){
no_cores <- detectCores()-1
clust <- makeCluster(no_cores)
y = a
result <- parSapply(clust, 1:10, function(x){y + x})
stopCluster(clust)
return(result)
}
x = 1
test(x)
Can someone explain what is going on in memory?
Upvotes: 3
Views: 534
Reputation: 11728
I would preferably use foreach()
instead of parSapply()
:
library(doParallel)
test <- function(a = 1) {
no_cores <- detectCores() - 1
registerDoParallel(clust <- makeCluster(no_cores))
on.exit(stopCluster(clust), add = TRUE)
foreach(x = 1:10, .combine = 'c') %dopar% { a + x }
}
You don't need to force a
to be evaluated when using foreach()
.
Moreover, you can register the parallel backend outside the function if you want.
See a tutorial on using foreach()
there (disclaimer: I'm the author of the tuto).
Upvotes: 0
Reputation: 17289
This is due to lazy evaluation. The argument a
is not evaluated in the function call untill its first use. In first case, the cluster does not known a
since it has not been evaluated in the parent environment. You can fix it by forcing the evaluation:
test <- function(a = 1){
no_cores <- detectCores()-1
clust <- makeCluster(no_cores)
force(a) # <------------------------
result <- parSapply(clust, 1:10, function(x){a + x})
stopCluster(clust)
return(result)
}
x = 1
test(x)
# [1] 2 3 4 5 6 7 8 9 10 11
Upvotes: 2