generic_user
generic_user

Reputation: 3562

R seed-sedding not "setting", results not reproducing

I've got a script that looks like this:

#This is the master script.  It runs all other scripts.
rm(list=ls()) 

#Run data cleaing script
source("datacleaning.R")

set.seed(413) #Seed pre-selected as lead author's wife's birthday (April 13th)
reps=128

#Make imputated datasets
source("makeimps.R")

#Model selection step 1.  
source("model_selection.1.R")
load("AIC_results.1")
AIC_results

#best model removed the year interaction

#Model selection step 2.  removed year interaction
source("model_selection.2.R")
load("AIC_results.2")
AIC_results

#all interactions pretty good.  keeping this model

#Final selected model:
source("selectedmodel.R")

I send this master script to a supercomputing cluster; it takes about 17 hours of CPU time and 40 minutes of walltime on 32 cores. (Hence my non-reproducible example). But when I run the script, look at the results, then run it again, and look at the results again, they are slightly different. Why? I set the seed! Does the seed get reset somehow? Do I need to specify the seed inside of each script file?

I need to increase the number of reps, because its clear that I haven't converged sufficiently. But that's a separate issue. Why are my results here not reproducing themselves and how do I fix?

Thanks in advance.

EDIT: I'm doing the parallelization through doMC and plyr. Some light googling based on comments below turns up the fact that one can't really set a "parallel seed" using these packages. I'd need to migrate my code to SNOW somehow. If anyone knows a solution with doMC and plyr, I'd be grateful to learn what it is.

Upvotes: 1

Views: 404

Answers (1)

Simon O'Hanlon
Simon O'Hanlon

Reputation: 59990

Look at the doRNG package, specifically developed for this kind of reproducible parallel computing. Set the seed inside the call to the loop and you will be able to reproduce your results exactly...

require(doParallel)
require(doRNG)
cl <- makeCluster(4)
registerDoParallel(cl)


unlist( foreach( i = 1:4 , .options.RNG = 413 ) %dorng% { runif(1) } )
#[1] 0.5251507 0.4326805 0.6409496 0.5523651

unlist( foreach( i = 1:4 , .options.RNG = 413 ) %dorng% { runif(1) } )
#[1] 0.5251507 0.4326805 0.6409496 0.5523651 

Upvotes: 2

Related Questions