upabove
upabove

Reputation: 1119

Parallel computing taking same or more time

I am trying to set up parallel computing in R for a large simulation, but I noticed that there is no improvement in time.

I tried a simple example:

library(foreach)
library(doParallel)

stime<-system.time(for (i in 1:10000) rnorm(10000))[3]
print(stime)
10.823

cl<-makeCluster(2)
registerDoParallel(cores=2)
stime<-system.time(ls<-foreach(s = 1:10000) %dopar% rnorm(10000))[3]
stopCluster(cl)
print(stime)
29.526

The system time is more then twice as much as it was in the original case without parallel computing.

Obviously I am doing something wrong but I cannot figure out what it is.

Upvotes: 0

Views: 401

Answers (1)

Steve Weston
Steve Weston

Reputation: 19667

Performing many tiny tasks in parallel can be very inefficient. The standard solution is to use chunking:

ls <- foreach(s=1:2) %dopar% {
  for (i in 1:5000) rnorm(10000)
}

Instead of executing 10,000 tiny tasks in parallel, this loop executes two larger tasks, and runs almost twice as fast as the sequential version on my Linux machine.

Also note that your "foreach" example is actually sending a lot of data from the workers to the master. My "foreach" example throws that data away just like your sequential example, so I think it's a better comparison.

If you need to return a large amount of data then a fair comparison would be:

ls <- lapply(rep(10000, 10000), rnorm)

versus:

ls <- foreach(s=1:2, .combine='c') %dopar% {
  lapply(rep(10000, 5000), rnorm)  
}

On my Linux machine the times are 8.6 seconds versus 7.0 seconds. That's not impressive due to the large communication to computation ratio, but it would have been much worse if I hadn't used chunking.

Upvotes: 2

Related Questions