donpresente
donpresente

Reputation: 1310

R: llply fully reproducible results in parallel

what changes should i do to have a reproducible result here? I run it multiple times and the result vector is different. Thanks for any help.

cl <- makeCluster(2)

registerDoParallel(2)

set.seed(123)

results <- unlist(llply(seq_along(1:4), .fun = function(x){
  runif(1)} ,.parallel = T, 
  .paropts = list(.export=ls(.GlobalEnv))))


stopCluster(cl)

Upvotes: 3

Views: 1322

Answers (1)

Steve Weston
Steve Weston

Reputation: 19677

The following example will give reproducible results on Linux, Mac OS X, and Windows:

library(plyr)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
opts <- list(preschedule=TRUE)
clusterSetRNGStream(cl, 123)
r <- llply(1:20,
           .fun = function(x) runif(10),
           .parallel = TRUE,
           .paropts = list(.options.snow=opts))

The preschedule=TRUE option is needed to prevent doParallel from using load balancing which would make the mapping of tasks to workers unpredictable.

If you're using Linux or Mac OS X and you want doParallel to use mclapply, you could use this approach:

if (.Platform$OS.type != "windows") {
  registerDoParallel(2)
  RNGkind("L'Ecuyer-CMRG")
  set.seed(123)
  mc.reset.stream()
  r <- llply(1:20,
             .fun = function(x) runif(10),
             .parallel = TRUE)
}

This works because mclapply uses prescheduling by default. It won't work on Windows because doParallel will implicitly create a cluster object, and the RNG initialization won't have any effect.

Note that in your example, you're creating a cluster object but not registering it, so it isn't going to be used by doParallel. You've got to use registerDoParallel(cl), otherwise doParallel will either use mclapply on a Posix computer or an implicitly created cluster object on a Windows computer. Obviously it's very important to initialize the cluster workers that will actually perform the parallel computations.

Upvotes: 4

Related Questions