Reputation: 39
What is the optimal number of processes per core? Say you're given a machine with 2 CPUs and 4 cores each, what is the number of processes that will give you the best performance?
Upvotes: 3
Views: 3745
Reputation: 19736
The answer is naturally - it depends. Obviously if you're interested in the performance of a certain single-threaded application, other processes just clutter your machine and compete over the shared resources. So let's look at two cases where this question may be interesting:
The second case is easier to answer, it (.. wait for it ..) depends on what you're running! If you have locks, more threads may lead to higher contention and conflicts. If you're lock free (or even some flavors of wait-free), you may still have fairness issues. It also depends on how the work is balanced internally in your application, or how your task schedulers work. There are simply too many possible solutions out there today.
If we assume you have perfect balancing between your threads, and no overhead for increased number, you can perhaps align this with the other use case where you simply run multiple independent processes. In that case, performance may have several sweet spots. The first is when you reach the number of physical cores (in your case 8, assuming you have 4 physical cores per socket). At that point, you're saturating your existing HW to the max. However, if you have some SMT mechanism (like Hyperthreading) supported, you may extend the overall number of cores by 2x, using 2 logical cores per each physical one. This doesn't add any resource into the story, it just splits the existing ones, which may have some penalty over the execution of each process, but on the other hand can run 2x processes simultaneously.
The overall aggregated speedup may vary, but i've seen number of up to 30% on average on generic benchmarks. As a thumbrule, processes that are memory latency bound or have complicated control flow, can benefit from this since the core can still progress when one thread is blocked. Code that is more oriented on execution bandwidth (like heavy floating point calculations) or memory bandwidth, wouldn't gain as much.
Beyond that number of processes, it may still be beneficial in some cases to add more processes - they won't run in parallel but if the overhead for context switches isn't too high, and you want to minimize the average wait time (which it also a way to look at performance that's not pure IPC), or you depend on communicating output out as early as possible - there are scenarios where this is useful.
One last point - the "optimal" number of processes may be even less than the number cores if your processes saturate other resources before reaching that point. If for example each thread requires a huge chunk virtual memory, you may start thrashing pages and page them off (painful penalty). If each thread has a large data-set which is uses over and over, you could fill up your shared cache and start losing from that point by adding more threads. Same goes for heavy IO, and so on.
As you can see, there's no right or wrong answer here, you simply need to benchmark your code over different systems.
Upvotes: 5