Jayden.Cameron
Jayden.Cameron

Reputation: 507

doParallel goes slower than sequential processing

I'm trying to get parallel processing to work on my local installation of RStudio or on RStudio cloud by using the doParallel package and following the tutorial here.

Unfortunately, turning on parallel processing seems to slow computation, rather than speed it up.

Test operation:

microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))

system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))

Results without parallel processing

Unit: milliseconds
                                    expr      min       lq    mean   median       uq      max neval
 foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161   100

   user  system elapsed 
   0.33    0.04    0.19

Results after turning on parallel processing - takes 2x as much time!

Unit: milliseconds
                                       expr      min       lq     mean   median       uq      max neval
 foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407   100

   user  system elapsed 
   0.28    0.10    0.37 

How strange! Any tips? Below I include the full script I ran as well as logs from my local RStudio session and that from RStudio cloud.


Full Script

install.packages('doParallel')
library(doParallel)
install.packages('microbenchmark')
library(microbenchmark)

# Without parallel processing
microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))

system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))

# Without parallel processing, get a warning
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))

system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))

# Turn on parallel with several cores
registerDoParallel(detectCores() - 2)

# See number of cores
getDoParWorkers()

# Test for speed improvement With parallel processing
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))

system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))

# Return to one worker
registerDoParallel(1)
registerDoSEQ()

Log from local run:

Restarting R session...


Warning message:
<REDACTED LINE>
Error 6 (The handle is invalid)
Features disabled: R source file indexing, Diagnostics
Error in summary.connection(connection) : invalid connection
Error in summary.connection(connection) : invalid connection
<REDACTED LINE>
> install.packages('doParallel')
Installing doParallel [1.0.16] ...
    OK [linked cache]
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 4.0.3 
2: package ‘foreach’ was built under R version 4.0.3 
3: package ‘iterators’ was built under R version 4.0.3 
> install.packages('microbenchmark')
Installing microbenchmark [1.4-7] ...
    OK [linked cache]
> library(microbenchmark)
Warning message:
package ‘microbenchmark’ was built under R version 4.0.3 
> 
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
                                    expr      min       lq    mean   median       uq      max neval
 foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161   100
> 
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
   user  system elapsed 
   0.33    0.04    0.19 
> 
> # Without parallel processing, get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
                                       expr      min      lq     mean   median       uq     max neval
 foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 178.1788 188.879 213.9808 197.2124 227.6921 698.484   100
Warning message:
executing %dopar% sequentially: no parallel backend registered 
> 
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
   user  system elapsed 
   0.22    0.03    0.25 
> 
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
> 
> # See number of cores
> getDoParWorkers()
[1] 6
> 
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
                                       expr      min       lq     mean   median       uq      max neval
 foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407   100
> 
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
   user  system elapsed 
   0.28    0.10    0.37 
> 
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()

Log from RStudio cloud:

Restarting R session...

> install.packages('doParallel')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/doParallel_1.0.16.tar.gz'
Content type 'application/x-tar' length 59776 bytes (58 KB)
==================================================
downloaded 58 KB

* installing *binary* package ‘doParallel’ ...
* DONE (doParallel)

The downloaded source packages are in
    ‘/tmp/RtmplDZYAT/downloaded_packages’
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> install.packages('microbenchmark')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/microbenchmark_1.4-7.tar.gz'
Content type 'application/x-tar' length 61382 bytes (59 KB)
==================================================
downloaded 59 KB

* installing *binary* package ‘microbenchmark’ ...
* DONE (microbenchmark)

The downloaded source packages are in
    ‘/tmp/RtmplDZYAT/downloaded_packages’
> library(microbenchmark)
> 
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
                                    expr      min       lq     mean   median       uq      max neval
 foreach(i = 1:1000) %do% sum(tanh(1:i)) 121.6417 126.5681 130.8152 129.7511 133.3043 171.6484   100
> 
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
   user  system elapsed 
  0.126   0.000   0.126 
> 
> # Without parallel processing, get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
                                       expr      min       lq     mean   median       uq      max neval
 foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 117.6518 124.2508 127.9016 127.1467 129.9798 171.9952   100
Warning message:
executing %dopar% sequentially: no parallel backend registered 
> 
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
   user  system elapsed 
  0.169   0.000   0.169 
> 
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
> 
> # See number of cores
> getDoParWorkers()
[1] 14
> 
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
                                       expr      min       lq     mean   median       uq      max neval
 foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 262.9285 302.7655 340.1377 325.8734 359.3806 707.4004   100
> 
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
   user  system elapsed 
  0.136   0.176   0.313 
> 
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
> 

Upvotes: 0

Views: 652

Answers (1)

polkas
polkas

Reputation: 4184

Summing up you should use the mclapply function on Linux to get a better performance.

There is few issues here. First of all not all tasks are proper for multiprocessing where your looks to not be very well suited for such (toy small tasks). Another thing is that that multiprocessing might be divided in multisession/multithrething in R. Check this question to find out why this distinction is so important. R mclapply vs foreach

For Linux you should use multithreating which will be much more efficient. Where foreach is a multisession (not multithreating) then it has to create a separate sessions and communicate between them. Thus for such a small toy example this additional processing is a quite significant one.

Upvotes: 1

Related Questions