sonicboom
sonicboom

Reputation: 5028

Results of parallelization with snowfall library not reproducible?

Each time I run the following code, the numbers in the vector result_seq remain the same, since I have used set.seed(11) before generating the vector.

However, it seems that even though I use set.seed(11) again before I generate the numbers in result_par, the numbers change every time I run the code.

library(snowfall)
snowfall::sfInit(parallel = TRUE, cpus = 4)

testFun = function(i) {
  result <- rnorm(1,10,3)
}

nsim <- 10

set.seed(11)
result_seq <- sapply(1:nsim, testFun)
print(mean(result_seq))

set.seed(11)
result_par <- sfLapply(1:nsim, testFun)
print(mean(as.numeric(result_par)))

Why is this happening? What can I do to ensure obtain the random numbers generated during the snowfall parallelization are reproducible?

Upvotes: 0

Views: 114

Answers (1)

Nate
Nate

Reputation: 10671

Since R is single-threaded, any parallel-izing of code is actually spinning up multiple sessions. So here you are actually spinning out 4 separate "child" sessions in sfLapply() and the seed setting is only happening once in your "parent" session. The "child" sessions are not aware of the others and thus not aware you want to re-set the seed in each of them.

You can move set.seed() into testFun() to solve this:

testFun = function(i) {
  set.seed(11)
  result <- rnorm(1,10,3)
}

sfExport might be worth exploring as it is designed to distribute parameters to the "child" sessions for contexts like this.

Upvotes: 1

Related Questions