Reputation: 167
I'm using the parallel package to get better CPU utilization and I thought it will reduce computation time significantly. But I got the opposite results, while CPU utilization got almost 100% for the 4 cores that I got, the time results indicate that using the parallel produced worst results that not using it. How can it be? Is this a problem with the package? Am I missing something else? my code is big so I can't present it here..
time without parallel 45 sec 1.04 min 1.5 min 6.14 min
time with parallel 1.3 min 1.7 min 2.3 min 14.5 min
number of variables 78 78 78 870
number of rows 30k 50k 70k 70k
Upvotes: -1
Views: 1055
Reputation: 26823
Before going to parallel processing you should try to improve the single core performance. Without seeing your code we cannot give any concrete advice, but the first step should be to profile your code. Useful resources are http://adv-r.had.co.nz/Performance.html and https://csgillespie.github.io/efficientR/.
Once you have achieved good single core performance, you can try parallel processing. As hinted in the comments, it is crucial to keep the communication overhead low. Again, without seeing your code we cannot give any concrete advice, but here is some general advice:
parallel
package does that by default as long as you do not use "load balancing". If you need load balancing for some reason, then you should group the tasks into a smaller number of chunks to be handled by the load balancing algorithm.Upvotes: 2