Reputation: 178
I'm new to parallel computing in R and want to use the parallel package to speed up my computation (which is more complex than the example below). However, the computations take way longer when using the mclapply function compared to the usual lapply function.
I installed a fresh Ubuntu 18.04.2 LTS on my Laptop which has 7.7 GB memory and a Intel® Core™ i7-4500U CPU @ 1.80GHz × 4 processor. I am running R on R studio.
require(parallel)
a <- seq(0, 1, length.out = 110) #data
b <- seq(0, 1, length.out = 110)
c <- replicate(1000, sample(1:100,size=10), simplify=FALSE)
function_A <- function(i, j, k) { # some random function to examplify the problem
i+ j * pmax(i-k,0)
}
#running it with mclapply
ptm_mc <- proc.time()
output <- mclapply(1:NROW(c), function(o){
mclapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_mclapply <- proc.time() - ptm_mc
# running it with lapply
ptm_lapply <- proc.time()
output <- lapply(1:NROW(c), function(o){
lapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_lapply <- proc.time() - ptm_lapply
The results from lapply are a lot faster than the mclapply results:
> time_mclapply
user system elapsed
6.030 439.112 148.088
> time_lapply
user system elapsed
1.662 0.165 1.827
Why do I get this result? Is it because of my setup or a common problem? How can I get results that are actually faster than the lapply results, so the whole thing will be quicker?
UPDATE: An update on the two remaining combinations of the nested loops:
ptm_mc_OUT <- proc.time()
output <- mclapply(1:NROW(c), function(o){
lapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_mclapply_OUT <- proc.time() - ptm_mc_OUT
ptm_mc_IN <- proc.time()
output <- lapply(1:NROW(c), function(o){
mclapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_mclapply_IN <- proc.time() - ptm_mc_IN
require(dplyr)
times <- rbind(time_mclapply,
time_lapply,
time_mclapply_OUT,
time_mclapply_IN) %>% data.frame()
times
This gives us
>times
user.self sys.self elapsed user.child sys.child
time_mclapply 0.075 0.081 22.621 1.933 34.266
time_lapply 1.070 0.049 1.118 0.000 0.000
time_mclapply_OUT 0.064 0.077 0.884 2.539 34.587
time_mclapply_IN 1.329 31.843 37.426 5.108 28.879
and on another run I got (so run times seem to vary quite a bit, is there a better way to display them?)
times_lapply
user.self sys.self elapsed user.child sys.child
time_mclapply 0.324 0.121 9.108 0.000 0.000
time_lapply 1.060 0.049 1.108 0.000 0.000
time_mclapply_OUT 0.211 0.092 1.155 10.791 19.632
time_mclapply_IN 1.221 22.196 27.089 5.130 23.032
Upvotes: 4
Views: 2971
Reputation: 11738
Let N be the number of threads of your machine. Some recommendations:
You should not use two levels of parallelism as you will use N^2 threads.
You should try to parallelize in outer loop instead of the inner one (as the overhead of parallelism will happen only once).
You should not use all threads (people typically use N-1 or N/2).
When using N/2 (mc.cores = parallel::detectCores() / 2
), time_mclapply_OUT
is twice as fast as time_lapply
.
Upvotes: 6