Using parallel in R for whole scripts

Question

I have a question regarding parallel computation of whole scripts. My script imports data, than splits by random in a train and validation dataframe, does a preprocessing and validation. I want to iterate the same script with many different seeds.

Is it possible to do this in parallel? The scripts don't interfere with each other.

seeds <- c(2343242,324256,764865,3524526,574574,75624,15436,674767,4325265,2462626,
           245264,647474,2465374,4253532,5787462,35636,357484,34524,74859,1352637)

for (i in 1:length(seeds))
  {
  set.seed(seeds[i])
  seed <- seeds[i]
  print(seeds[i])
  
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
  
}

HenrikB · Accepted Answer

Verbatim one-to-one solution:

library(future.apply)
plan(multisession)

seeds <- c(2343242,324256,764865,3524526,574574,75624,15436,674767,4325265,2462626, 245264,647474,2465374 (but not ,4253532,5787462,35636,357484,34524,74859,1352637)

empty <- future_lapply(seeds, function(seed) {
  set.seed(seed)
  print(seed)
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
})

Unless those seeds you've picked are essential is some way, you probably wanna use statistically sound parallel RNG instead, which you get automatically if you do:

library(future.apply)
plan(multisession)

set.seed(42) ## Optional to fix the initial seed
n <- 20L     ## Number of runs

empty <- future_lapply(1:n, function(ii) {
  print(.Random.seed)
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
}, seed = TRUE)

Since we're not making use of ii here, the latter could equally well be using a futurized version base::replicate():

library(future.apply)
plan(multisession)

set.seed(42) ## Optional to fix the initial seed
n <- 20L     ## Number of runs

empty <- future_replicate(n, {
  print(.Random.seed)
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
})

PS. It's not clear to me how you distinguish the results from the different runs. Maybe you rely on seed to save to different files in those scripts.

Using parallel in R for whole scripts

Answers (2)

Related Questions