Reputation: 367
I am trying to run a function which there is a random number generator within the function. The results at not as what I expected so I have done the following test:
# Case 1
set.seed(100)
A1 = matrix(NA,20,10)
for (i in 1:10) {
A1[,i] = sample(1:100,20)
}
# Case 2
set.seed(100)
A2 = sapply(seq_len(10),function(x) sample(1:100,20))
# Case 3
require(parallel)
set.seed(100)
cl <- makeCluster(detectCores() - 1)
A3 = parSapply(cl,seq_len(10), function(x) sample(1:100,20))
stopCluster(cl)
# Check: Case 1 result equals Case 2 result
identical(A1,A2)
# [1] TRUE
# Check: Case 1 result does NOT equal to Case 3 result
identical(A1,A3)
# [1] FALSE
# Check2: Would like to check if it's a matter of ordering
range(rowSums(A1))
# [1] 319 704
range(rowSums(A3))
# [1] 288 612
In the above code, the parSapply generates a different set of random numbers than A1 and A2. My purpose of having Check2 is that, I was suspecting that parSapply might alter the order however it doesn't seem to be case as the max and min sums of these random numbers are different.
Appreciate if someone could shed some colour on why parSapply would give a different result from sapply. What am I missing here?
Thanks in advance!
Upvotes: 1
Views: 854
Reputation: 22333
Have a look at ?vignette(parallel)
and in particular at "Section 6 Random-number generation". Among other things it states the following
Some care is needed with parallel computation using (pseudo-)random numbers: the processes/threads which run separate parts of the computation need to run independent (and preferably reproducible) random-number streams.
When an R process is started up it takes the random-number seed from the object .Random.seed in a saved workspace or constructs one from the clock time and process ID when random-number generation is first used (see the help on RNG). Thus worker processes might get the same seed because a workspace containing .Random.seed was restored or the random number generator has been used before forking: otherwise these get a non-reproducible seed (but with very high probability a different seed for each worker).
You should also have a look at ?clusterSetRNGStream
.
Upvotes: 5