Reputation: 1589
For example, I need to randomly select 10 numbers from 500 ones.
set.seed(1)
xxx1 <- sample(1:500, 10)
print(xxx1)
133 186 286 452 101 445 467 326 310 31
library("parallel")
cl <- makeCluster(1)
clusterSetRNGStream(cl, 1) # seed is 1
xxx2 <- parLapply(cl, 1, function(x) { return(sample(1:500, 10)) })[[1]]
stopCluster(cl); rm(cl)
print(xxx2)
339 214 454 475 417 171 177 212 221 198
I used the same seed but got different output.
How to make xxx1
same with xxx2
?
Upvotes: 2
Views: 1013
Reputation: 6815
Parallel RNGs (e.g. parallel::clusterSetRNGStream()
) use the L'Ecuyer-CMRG method. Sequential RNG defaults to the Mersenne-Twister method, cf. RNGkind()
. It is possible to use L'Ecuyer-CMRG RNG streams in sequential mode too, but I'd expect it to be a bit tedious to get right. Whatever you do, do not use non-parallel RNGs in parallel mode.
(Disclaimer: I'm the author). The future.apply package will allow you to produce identical parallel RNGs regardless whether you use sequential or parallel processing. You will get the exact same results. This is true also for all types of parallel backends as well as the number of parallel workers you use. For example,
library(future.apply)
plan(sequential) # default
set.seed(1)
y0 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
plan(multisession) # PSOCK cluster == parallel::makeCluster()
set.seed(1)
y1 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y1, y0))
plan(multisession, workers = 2)
set.seed(1)
y2 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y2, y0))
plan(multisession, workers = 3)
set.seed(1)
y3 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y3, y0))
plan(multicore) ## forked processing == parallel::mclapply()
set.seed(1)
y4 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y4, y0))
plan(future.callr::callr) ## background R session via callr package
set.seed(1)
y5 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y5, y0))
Upvotes: 1
Reputation: 2218
you are setting a seed for clusterSetRNGStream
, which will help you generate and reproduce the same set of random streams for your parallel runs of that function. It will not do what you intended.
you can probably set a seed inside your function to reproduce the output from both the implementations. Something like:
# w/o parallel
set.seed(1)
xxx1 <- sample(1:500, 10)
print(xxx1)
# [1] 133 186 286 452 101 445 467 326 310 3
# w parallel
library("parallel")
cl <- makeCluster(1)
xxx2 <- parLapply(cl, 1, function(x) { set.seed(1); return(sample(1:500, 10)) })[[1]]
stopCluster(cl); rm(cl)
print(xxx2)
# [1] 133 186 286 452 101 445 467 326 310 3
Upvotes: 1