Reputation: 97
I am trying to sample for two vectors 1000 times with replacement and calculate the ratio of means. Repeat this process 10,000 times.
I wrote a sample parallel code but it's taking much longer that using simple for loops on a single machine.
ratio_sim_par <- function(x1, x2, nrep = 1000) {
# Initiate cluster
cl <- makeCluster(detectCores() - 1) #Leave one core for other operations
clusterExport(cl, varlist=c("x1", "x2", "nrep"), envir=environment())
Tboot <- parLapply(cl, 1:nrep, function(x){
n1 <- length(x1)
n2 <- length(x2)
xx1 <- sample(x1, n1, replace = TRUE) # sample of size n1 with replacement from x1
xx2 <- sample(x2, n2, replace = TRUE) # sample of size n2 with replacement from x2
return(mean(xx1) / mean(xx2))
})
stopCluster(cl)
return(unlist(Tboot))
}
ratio_sim_par(x1, x2, 10000)
System time is unbearable. Can anyone help me understand the mistake I'm making? Thanks
Upvotes: 3
Views: 1139
Reputation: 2283
Distributing tasks to different nodes takes a lot of computational overhead and can cancel out any gains you make from parallelizing your script. In your case, you're calling parLapply
10,000 times and probably spending more resources forking each task than actually doing the resampling. Try something like this with a non-parallel version of ratio_sim_par
:
mclapply(1:10000, ratio_sim_par, x1, x2, nrep = 1000, mc.cores = n_cores)
mclapply
will split the job into as many cores as you have available and fork it once. I'm using mclapply
instead of parLapply
because I'm used to it and doesn't require as much setup.
Upvotes: 2