Georgia S
Georgia S

Reputation: 622

MPI - No performance gain when using every available core on the machine

I have a C program (acoustic wave solver) that is parallelized with MPI. However, I've been testing the speed up on various numbers of cores and I've noticed something strange. If I use N processes where N is the number of available cores in the machine, then I do not see a performance improvement over the next step down.

So on my 8 core machine then I see speedup from 1 process to 2 processes to 4 processes, but not from 4 to 8. Similarly on my 4 core laptop I see speedup from 1 to 2, but not from 2 to 4.

Any idea what could be causing this?

Upvotes: 1

Views: 481

Answers (1)

jotasi
jotasi

Reputation: 5177

Many modern (Intel-)cpu run two hyperthreads on a single physical core. The number of cores you are referencing are actually the number of hardware threads that are available, not the number of physical execution units.

As long as you are using a number of processes that is smaller or equal to the number of physical cores, the processes will (or at least should) be distributed to use all of the available codes. But as soon as all physical cores are taken, additional processes will share a physical core with another process.

It is not possible to give a definitive answer on if using all threads will increase your performance at all or by how much. That strongly depends on the code you are running. A very nice answer to a similar question is given on superuser.com. Essentially, if your process is memory-bound or uses different parts of your cpu (Integer/Floating point arithmetic, Video encoding, vector processing, ...) and communication overhead is small you might even get perfect scaling. Code that is cpu-bound and only does one type of computation might not give any improvement, or might even take longer due to communication overhead.

Upvotes: 1

Related Questions