migdal menora
migdal menora

Reputation: 167

Why using parallel computing package makes my R code run slower

I'm using the parallel package to get better CPU utilization and I thought it will reduce computation time significantly. But I got the opposite results, while CPU utilization got almost 100% for the 4 cores that I got, the time results indicate that using the parallel produced worst results that not using it. How can it be? Is this a problem with the package? Am I missing something else? my code is big so I can't present it here..

time without parallel   45 sec  1.04 min 1.5 min 6.14 min
time with parallel      1.3 min 1.7 min  2.3 min 14.5 min
number of variables      78     78       78      870
number of rows          30k     50k      70k    70k

Upvotes: -1

Views: 1055

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26823

Before going to parallel processing you should try to improve the single core performance. Without seeing your code we cannot give any concrete advice, but the first step should be to profile your code. Useful resources are http://adv-r.had.co.nz/Performance.html and https://csgillespie.github.io/efficientR/.

Once you have achieved good single core performance, you can try parallel processing. As hinted in the comments, it is crucial to keep the communication overhead low. Again, without seeing your code we cannot give any concrete advice, but here is some general advice:

  • Do not use a sequence of multiple parallelized steps. A single parallelized step which does all the work in sequence will have lower communication overhead.
  • Use a reasonable chunk size. If you have 10.000 tasks then don't send the individually but in suitable groups. The parallel package does that by default as long as you do not use "load balancing". If you need load balancing for some reason, then you should group the tasks into a smaller number of chunks to be handled by the load balancing algorithm.

Upvotes: 2

Related Questions