Feng Tian
Feng Tian

Reputation: 1589

How to use the same seed to produce the same output with or without parallel in R?

For example, I need to randomly select 10 numbers from 500 ones.

1: without parallel

set.seed(1)
xxx1 <- sample(1:500, 10)
print(xxx1)

133 186 286 452 101 445 467 326 310 31

2: with parallel

library("parallel")
cl <- makeCluster(1)
clusterSetRNGStream(cl, 1)  # seed is 1
xxx2 <- parLapply(cl, 1, function(x) { return(sample(1:500, 10)) })[[1]]
stopCluster(cl); rm(cl)
print(xxx2)

339 214 454 475 417 171 177 212 221 198

I used the same seed but got different output.
How to make xxx1 same with xxx2?

Upvotes: 2

Views: 1013

Answers (2)

HenrikB
HenrikB

Reputation: 6815

Parallel RNGs (e.g. parallel::clusterSetRNGStream()) use the L'Ecuyer-CMRG method. Sequential RNG defaults to the Mersenne-Twister method, cf. RNGkind(). It is possible to use L'Ecuyer-CMRG RNG streams in sequential mode too, but I'd expect it to be a bit tedious to get right. Whatever you do, do not use non-parallel RNGs in parallel mode.

(Disclaimer: I'm the author). The future.apply package will allow you to produce identical parallel RNGs regardless whether you use sequential or parallel processing. You will get the exact same results. This is true also for all types of parallel backends as well as the number of parallel workers you use. For example,

library(future.apply)

plan(sequential) # default
set.seed(1)
y0 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)

plan(multisession) # PSOCK cluster == parallel::makeCluster()
set.seed(1)
y1 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y1, y0))

plan(multisession, workers = 2)
set.seed(1)
y2 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y2, y0))

plan(multisession, workers = 3)
set.seed(1)
y3 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y3, y0))

plan(multicore)  ## forked processing == parallel::mclapply()
set.seed(1)
y4 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y4, y0))

plan(future.callr::callr)  ## background R session via callr package
set.seed(1)
y5 <- future_lapply(1:3, function(x) sample(1:500, 10), future.seed = TRUE)
stopifnot(identical(y5, y0))

Upvotes: 1

Mankind_2000
Mankind_2000

Reputation: 2218

you are setting a seed for clusterSetRNGStream, which will help you generate and reproduce the same set of random streams for your parallel runs of that function. It will not do what you intended.

you can probably set a seed inside your function to reproduce the output from both the implementations. Something like:

# w/o parallel
set.seed(1)
xxx1 <- sample(1:500, 10)
print(xxx1)
# [1] 133 186 286 452 101 445 467 326 310  3

# w parallel
library("parallel")
cl <- makeCluster(1)
xxx2 <- parLapply(cl, 1, function(x) { set.seed(1); return(sample(1:500, 10)) })[[1]]
stopCluster(cl); rm(cl)
print(xxx2)
# [1] 133 186 286 452 101 445 467 326 310  3

Upvotes: 1

Related Questions