Reputation: 5028
Each time I run the following code, the numbers in the vector result_seq
remain the same, since I have used set.seed(11)
before generating the vector.
However, it seems that even though I use set.seed(11)
again before I generate the numbers in result_par
, the numbers change every time I run the code.
library(snowfall)
snowfall::sfInit(parallel = TRUE, cpus = 4)
testFun = function(i) {
result <- rnorm(1,10,3)
}
nsim <- 10
set.seed(11)
result_seq <- sapply(1:nsim, testFun)
print(mean(result_seq))
set.seed(11)
result_par <- sfLapply(1:nsim, testFun)
print(mean(as.numeric(result_par)))
Why is this happening? What can I do to ensure obtain the random numbers generated during the snowfall parallelization are reproducible?
Upvotes: 0
Views: 114
Reputation: 10671
Since R is single-threaded, any parallel-izing of code is actually spinning up multiple sessions. So here you are actually spinning out 4 separate "child" sessions in sfLapply()
and the seed setting is only happening once in your "parent" session. The "child" sessions are not aware of the others and thus not aware you want to re-set the seed in each of them.
You can move set.seed()
into testFun()
to solve this:
testFun = function(i) {
set.seed(11)
result <- rnorm(1,10,3)
}
sfExport
might be worth exploring as it is designed to distribute parameters to the "child" sessions for contexts like this.
Upvotes: 1