Taotao Tan
Taotao Tan

Reputation: 283

R foreach does not improve speed

I am trying to use foreach to parallelize some matrix calculations. However, I found it gives similar performance to apply and for loop, despite I used 10 cores.

library(microbenchmark)
library(foreach)
library(doParallel)


mat = matrix(rnorm(3000 * 3000), 3000) # 3000 X 3000
g = rnorm(3000) # 3000 

# use 10 cores
cl <- makeCluster(10)
registerDoParallel(cl)
microbenchmark(
  # multi-core with foreach 
  foreach(i = 1:100, .combine = c) %dopar% {t(g) %*% mat %*% g},
  
  # sapply 
  sapply(1:100, function(i){t(g) %*% mat %*% g}), 
  
  # for loop 
  for(i in 1:100){t(g) %*% mat %*% g},
  
  times = 20)
stopCluster(cl)

enter image description here

On the other hand, if I use a different function (Sys.sleep()), foreach indeed can be ~10X faster than apply and for loop.

cl <- makeCluster(10)
registerDoParallel(cl)
microbenchmark(
  # multi-core with foreach 
  foreach(i = 1:100, .combine = c) %dopar% {Sys.sleep(0.01) },
  
  # sapply 
  sapply(1:100, function(i){Sys.sleep(0.01)}), 
  
  # for loop 
  for(i in 1:100){Sys.sleep(0.01)},
  
  times = 20)
stopCluster(cl)

enter image description here

What is the reason, and how can I improve the performance for matrix calculation?

Upvotes: 0

Views: 97

Answers (1)

Taotao Tan
Taotao Tan

Reputation: 283

@Dave2e proposed that I should increase the number of iterations. Indeed, if I have

cl <- makeCluster(10)
registerDoParallel(cl)
microbenchmark(
  # multi-core with foreach 
  foreach(i = 1:3000, .combine = c) %dopar% {t(g) %*% mat %*% g},
  
  # sapply 
  sapply(1:3000, function(i){t(g) %*% mat %*% g}), 
  
  # for loop 
  for(i in 1:3000){t(g) %*% mat %*% g},
  
  times = 5)
stopCluster(cl)

Then foreach uses 5s, compare to 32s for apply and loop

Upvotes: 1

Related Questions