hjw
hjw

Reputation: 1289

Variables in function arguments do not pass to cluster when parallel computing

I am having difficulties understanding how variables are scoped/passed to the functions when interacting with the parallel package

library(parallel)

test <- function(a = 1){
  no_cores <- detectCores()-1
  clust <- makeCluster(no_cores)
  result <- parSapply(clust, 1:10, function(x){a + x})
  stopCluster(clust)
  return(result)
}

test()
[1]  4  5  6  7  8  9 10 11 12 13

x = 1
test(x)

Error in checkForRemoteErrors(val) : 
3 nodes produced errors; first error: object 'x' not found

test() works but test(x) doesn't. When I modify the function as follows, it works.

test <- function(a = 1){
  no_cores <- detectCores()-1
  clust <- makeCluster(no_cores)
  y = a
  result <- parSapply(clust, 1:10, function(x){y + x})
  stopCluster(clust)
  return(result)
}

x = 1
test(x)

Can someone explain what is going on in memory?

Upvotes: 3

Views: 534

Answers (2)

F. Priv&#233;
F. Priv&#233;

Reputation: 11728

I would preferably use foreach() instead of parSapply():

library(doParallel)

test <- function(a = 1) {
  no_cores <- detectCores() - 1
  registerDoParallel(clust <- makeCluster(no_cores))
  on.exit(stopCluster(clust), add = TRUE)
  foreach(x = 1:10, .combine = 'c') %dopar% { a + x }
}

You don't need to force a to be evaluated when using foreach(). Moreover, you can register the parallel backend outside the function if you want.

See a tutorial on using foreach() there (disclaimer: I'm the author of the tuto).

Upvotes: 0

mt1022
mt1022

Reputation: 17289

This is due to lazy evaluation. The argument a is not evaluated in the function call untill its first use. In first case, the cluster does not known a since it has not been evaluated in the parent environment. You can fix it by forcing the evaluation:

test <- function(a = 1){
    no_cores <- detectCores()-1
    clust <- makeCluster(no_cores)
    force(a)    # <------------------------
    result <- parSapply(clust, 1:10, function(x){a + x})
    stopCluster(clust)
    return(result)
}

x = 1
test(x)
#  [1]  2  3  4  5  6  7  8  9 10 11

Upvotes: 2

Related Questions