Reputation: 15
My parallel program is implemented by using c++ and OpenMPI. When I test it, I found if I used more cpus it costs more time. How could this happens?
The structure of my code is as follows:
BEGIN
MPI::Init
if (rank == 0)
{ Read files }
MPI::Bcast
MPI::Scatter
for (i=0; i<N;i++)
{
do something here
MPI::Gather
if(rank ==0)
{ save result}
}
MPI::Finallize()
END
I am confused about this.
Upvotes: 1
Views: 586
Reputation: 1529
For your program the amont of code that runs parallel declares the performance http://en.wikipedia.org/wiki/Amdahl's_law .and many other parameters affect the performance such as your computer architecture for example if you use shared memory system your memory is important,in your code if files are big it can decrease the performance and in this case you must use derived datatypes for communication,network speed is important for distributed systems,....
Upvotes: 1
Reputation: 178411
It is hard to know without more information on the environment and the actual code that is being ran, but note that MPI::Gather()
and MPI::Bcast()
are blocking calls. The process must wait for all processes to reach this point.
If one CPU is extremely slow - waiting for it to reach the Bcast()
, will slow down the total time.
Upvotes: 1
Reputation: 78306
Extended comment, not answer:
@111111's comment, that unless the workload is large enough then parallelisation can actually slow computations, is correct. Since you only post an outline of your code we can't unequivocally diagnose this as the root of your problem, but it's not an unreasonable conclusion to jump to.
In general, you cannot expect a parallel version of a serial program to be faster under all circumstances. There is a cost to parallelisation (sometimes called 'parallel overhead'). In your code, for example, the broadcast and scatter operations contribute to this overhead, you only do them in a parallel code, if they are time consuming they can cancel out (or worse) the benefits of faster computation on multiple CPUs.
I'm going to go ahead and guess that you are relatively new to parallel programming and suggest that this issue, of the costs and benefits of parallelisation, is one that you should study with respect to your code and your problems. You should definitely aim to develop a good understanding, one backed up by data derived from experiments, of how the performance of your program(s) scales when you increase the job size and when you increase the number of processors.
EDIT
One further minor point: make sure you use the right routines for timing your program. I suggest you use mph_wtime()
. I have seen naive programmers use calls to things like utime
and end up adding together the time used by all N processors; all you should be interested in is the wall-clock time from start to finish (or between 2 points of interest).
Upvotes: 2