Reputation: 507
I'm trying to get parallel processing to work on my local installation of RStudio or on RStudio cloud by using the doParallel
package and following the tutorial here.
Unfortunately, turning on parallel processing seems to slow computation, rather than speed it up.
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161 100
user system elapsed
0.33 0.04 0.19
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407 100
user system elapsed
0.28 0.10 0.37
How strange! Any tips? Below I include the full script I ran as well as logs from my local RStudio session and that from RStudio cloud.
Full Script
install.packages('doParallel')
library(doParallel)
install.packages('microbenchmark')
library(microbenchmark)
# Without parallel processing
microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
# Without parallel processing, get a warning
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
# Turn on parallel with several cores
registerDoParallel(detectCores() - 2)
# See number of cores
getDoParWorkers()
# Test for speed improvement With parallel processing
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
# Return to one worker
registerDoParallel(1)
registerDoSEQ()
Log from local run:
Restarting R session...
Warning message:
<REDACTED LINE>
Error 6 (The handle is invalid)
Features disabled: R source file indexing, Diagnostics
Error in summary.connection(connection) : invalid connection
Error in summary.connection(connection) : invalid connection
<REDACTED LINE>
> install.packages('doParallel')
Installing doParallel [1.0.16] ...
OK [linked cache]
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 4.0.3
2: package ‘foreach’ was built under R version 4.0.3
3: package ‘iterators’ was built under R version 4.0.3
> install.packages('microbenchmark')
Installing microbenchmark [1.4-7] ...
OK [linked cache]
> library(microbenchmark)
Warning message:
package ‘microbenchmark’ was built under R version 4.0.3
>
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161 100
>
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
user system elapsed
0.33 0.04 0.19
>
> # Without parallel processing, get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 178.1788 188.879 213.9808 197.2124 227.6921 698.484 100
Warning message:
executing %dopar% sequentially: no parallel backend registered
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.22 0.03 0.25
>
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
>
> # See number of cores
> getDoParWorkers()
[1] 6
>
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407 100
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.28 0.10 0.37
>
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
Log from RStudio cloud:
Restarting R session...
> install.packages('doParallel')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/doParallel_1.0.16.tar.gz'
Content type 'application/x-tar' length 59776 bytes (58 KB)
==================================================
downloaded 58 KB
* installing *binary* package ‘doParallel’ ...
* DONE (doParallel)
The downloaded source packages are in
‘/tmp/RtmplDZYAT/downloaded_packages’
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> install.packages('microbenchmark')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/microbenchmark_1.4-7.tar.gz'
Content type 'application/x-tar' length 61382 bytes (59 KB)
==================================================
downloaded 59 KB
* installing *binary* package ‘microbenchmark’ ...
* DONE (microbenchmark)
The downloaded source packages are in
‘/tmp/RtmplDZYAT/downloaded_packages’
> library(microbenchmark)
>
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 121.6417 126.5681 130.8152 129.7511 133.3043 171.6484 100
>
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
user system elapsed
0.126 0.000 0.126
>
> # Without parallel processing, get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 117.6518 124.2508 127.9016 127.1467 129.9798 171.9952 100
Warning message:
executing %dopar% sequentially: no parallel backend registered
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.169 0.000 0.169
>
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
>
> # See number of cores
> getDoParWorkers()
[1] 14
>
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 262.9285 302.7655 340.1377 325.8734 359.3806 707.4004 100
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.136 0.176 0.313
>
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
>
Upvotes: 0
Views: 652
Reputation: 4184
Summing up you should use the mclapply
function on Linux to get a better performance.
There is few issues here. First of all not all tasks are proper for multiprocessing where your looks to not be very well suited for such (toy small tasks). Another thing is that that multiprocessing might be divided in multisession/multithrething in R. Check this question to find out why this distinction is so important. R mclapply vs foreach
For Linux you should use multithreating which will be much more efficient.
Where foreach
is a multisession (not multithreating) then it has to create a separate sessions and communicate between them. Thus for such a small toy example this additional processing is a quite significant one.
Upvotes: 1