Karl A
Karl A

Reputation: 178

mclapply() performs significantly worse than lapply(). How can I speed things up?

I'm new to parallel computing in R and want to use the parallel package to speed up my computation (which is more complex than the example below). However, the computations take way longer when using the mclapply function compared to the usual lapply function.

I installed a fresh Ubuntu 18.04.2 LTS on my Laptop which has 7.7 GB memory and a Intel® Core™ i7-4500U CPU @ 1.80GHz × 4 processor. I am running R on R studio.

require(parallel)

a <- seq(0, 1, length.out = 110) #data
b <-  seq(0, 1, length.out = 110)
c <- replicate(1000, sample(1:100,size=10), simplify=FALSE)

function_A <- function(i, j, k) { # some random function to examplify the problem
  i+ j * pmax(i-k,0) 
}

#running it with mclapply 
ptm_mc <- proc.time()  
output <- mclapply(1:NROW(c), function(o){ 
  mclapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_mclapply <- proc.time() - ptm_mc

# running it with lapply
ptm_lapply <- proc.time()  
output <- lapply(1:NROW(c), function(o){
  lapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_lapply <- proc.time() - ptm_lapply

The results from lapply are a lot faster than the mclapply results:

 > time_mclapply
       user      system     elapsed 
      6.030     439.112     148.088 
 > time_lapply
       user      system     elapsed 
      1.662       0.165       1.827 

Why do I get this result? Is it because of my setup or a common problem? How can I get results that are actually faster than the lapply results, so the whole thing will be quicker?

UPDATE: An update on the two remaining combinations of the nested loops:

ptm_mc_OUT <- proc.time()  
output <- mclapply(1:NROW(c), function(o){
  lapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
 time_mclapply_OUT <- proc.time() - ptm_mc_OUT

ptm_mc_IN <- proc.time()  
output <- lapply(1:NROW(c), function(o){
  mclapply(1:NROW(a),function(p) function_A(a[p], b, c[[o]]))})
time_mclapply_IN <- proc.time() - ptm_mc_IN

 require(dplyr)
times <- rbind(time_mclapply,
                      time_lapply,
                      time_mclapply_OUT, 
                      time_mclapply_IN) %>% data.frame()

times

This gives us

>times
                  user.self sys.self elapsed user.child sys.child
time_mclapply         0.075    0.081  22.621      1.933    34.266
time_lapply           1.070    0.049   1.118      0.000     0.000
time_mclapply_OUT     0.064    0.077   0.884      2.539    34.587
time_mclapply_IN      1.329   31.843  37.426      5.108    28.879

and on another run I got (so run times seem to vary quite a bit, is there a better way to display them?)

times_lapply
                   user.self sys.self elapsed user.child sys.child
time_mclapply         0.324    0.121   9.108      0.000     0.000
time_lapply           1.060    0.049   1.108      0.000     0.000
time_mclapply_OUT     0.211    0.092   1.155     10.791    19.632
time_mclapply_IN      1.221   22.196  27.089      5.130    23.032

Upvotes: 4

Views: 2971

Answers (1)

F. Priv&#233;
F. Priv&#233;

Reputation: 11738

Let N be the number of threads of your machine. Some recommendations:

  1. You should not use two levels of parallelism as you will use N^2 threads.

  2. You should try to parallelize in outer loop instead of the inner one (as the overhead of parallelism will happen only once).

  3. You should not use all threads (people typically use N-1 or N/2).

When using N/2 (mc.cores = parallel::detectCores() / 2), time_mclapply_OUT is twice as fast as time_lapply.

Upvotes: 6

Related Questions