Reputation: 463
I am currently using the parallel
package in R and I am trying to make by work reproducible by setting seeds.
However, if you set the seed before creating the cluster and performing the tasks you want in parallel, for some reason, it doesn't make it reproducible. I think I need to set the seed for each core when I make the cluster.
I have made a small example here to illustrate my problem:
library(parallel)
# function to generate 2 uniform random numbers
runif_parallel <- function() {
# make cluster of two cores
cl <- parallel::makeCluster(2)
# sample uniform random numbers
samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i) runif(1))
# close cluster
parallel::stopCluster(cl)
return(unlist(samples))
}
set.seed(41)
test1 <- runif_parallel()
set.seed(41)
test2 <- runif_parallel()
# they should be the same since they have the same seed
identical(test1, test2)
In this example, the test1
and test2
should be the same, as they have the same seed, but they return different results.
Can I get some help with where I'm going wrong please?
Note that I've written this example the way I have to mimic how I'm using it right now - there are probably cleaner ways to generate two random uniform numbers in parallel.
Upvotes: 4
Views: 3839
Reputation: 73482
We may use parallel::clusterSetRNGStream()
.
library(parallel)
CL <- makeCluster(detectCores() - 1)
clusterSetRNGStream(CL, 42) ## set seed
t(parSapply(CL, 1:3, \(i) runif(1)))
# [,1] [,2] [,3]
# [1,] 0.1738456 0.5004388 0.127589
clusterSetRNGStream(CL, 42) ## set same seed again
t(parSapply(CL, 1:3, \(i) runif(1)))
# [,1] [,2] [,3]
# [1,] 0.1738456 0.5004388 0.127589
stopCluster(CL)
Upvotes: 0
Reputation: 1
In case it might be of help, one quick way to set separate seeds for each core
# Set number of cores, here to n-1
nCores <- parallel::detectCores()-1
# Parallelization, n-1 cores
parallel::setDefaultCluster(cl = (cl <- parallel::makeCluster(nCores)))
thisSeed <- 1
# Set different seeds in each cluster; here set to 1 for the cluster, 2 for the second, etc.
# Do something in each cluster
(parRes <- do.call(c, parallel::clusterApply(x = 1:nCores+thisSeed-1, fun = function(x) {
set.seed(x)
runif(1)
})))
# Stop cluster
stopCluster(cl = cl)
# Same thing, not in parallel
singleRes <- do.call(c, lapply(1:nCores+thisSeed-1, function(x) {
set.seed(x)
runif(1)
}))
# Verify that the results are the same
all(parRes == singleRes)
Upvotes: 0
Reputation: 4614
You need to run set.seed
within each job.
Here is a reproducable random generation:
cl <- parallel::makeCluster(2)
# sample uniform random numbers
parallel::clusterEvalQ(cl, set.seed(41));
samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples
# [[1]]
# [1] 0.2655087
#
# [[2]]
# [1] 0.1848823
samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples
# [[1]]
# [1] 0.2655087
#
# [[2]]
# [1] 0.1848823
parallel::stopCluster(cl)
Upvotes: 5