Reputation: 1986
I have a very simple MPI program to test the behavior of MPI_Reduce. My objectives are simple:
~ Start by having each process create a random number (range 1-100)
Then run program with mpirun -np 5 <program_name_here>
And here's my program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
int sum = 0;
int product = 0;
int max = 0;
int min = 0;
int bitwiseAnd = 0;
int main ( int argc, char **argv )
{
int my_id, num_procs;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
int num;
srand(time(NULL) * my_id);
num = rand() % 100; //give num the random number
printf("Process #%i: Here is num: %i\n",my_id,num);
if(my_id == 0){
printf("Okay it entered 0\n");
MPI_Reduce(&num, &sum,1,MPI_INT,MPI_SUM, 0, MPI_COMM_WORLD);
}else if(my_id == 1){
printf("Okay it entered 1\n");
MPI_Reduce(&num, &product,1,MPI_INT,MPI_PROD, 0, MPI_COMM_WORLD);
}else if(my_id == 2){
printf("Okay it entered 2\n");
MPI_Reduce(&num, &max,1,MPI_INT,MPI_MAX, 0, MPI_COMM_WORLD);
}else if(my_id == 3){
printf("Okay it entered 3\n");
MPI_Reduce(&num, &min,1,MPI_INT,MPI_MIN, 0, MPI_COMM_WORLD);
}else if(my_id == 4){
printf("Okay it entered 4\n");
MPI_Reduce(&num, &bitwiseAnd,1,MPI_INT,MPI_BAND, 0, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
if(my_id == 0){
printf("I am process %i and the sum is %i\n",my_id,sum);
printf("I am process %i and the product is %i\n",my_id,product);
printf("I am process %i and the max is %i\n",my_id,max);
printf("I am process %i and the min is %i\n",my_id,min);
printf("I am process %i and the bitwiseAdd is %i\n",my_id,bitwiseAnd);
}
MPI_Finalize();
}
This produces output like this:
[blah@blah example]$ mpirun -np 5 all
Process #2: Here is num: 21
Okay it entered 2
Process #4: Here is num: 52
Okay it entered 4
Process #0: Here is num: 83
Okay it entered 0
Process #1: Here is num: 60
Okay it entered 1
Process #3: Here is num: 66
Okay it entered 3
I am process 0 and the sum is 282
I am process 0 and the product is 0
I am process 0 and the max is 0
I am process 0 and the min is 0
I am process 0 and the bitwiseAdd is 0
[blah@blah example]$
Why doesn't process 0 pick up the MPI_Reduce results from the other processes?
Upvotes: 2
Views: 1398
Reputation: 22660
The answer from zwol is basically correct, but I would like to reassure his hypothesis:
MPI_Reduce
is a collective operation, it has to be called by all members of the communicator argument. In case of MPI_COMM_WORLD
this means all initial ranks in the application.
The MPI standard (5.9.1) is also helpful here:
The routine is called by all group members using the same arguments for count, datatype, op, root and comm. Thus, all processes provide input buffers of the same length [...]
It is important to understand, that the root is not the one doing all the computations. The operation is done in a distributed fashion, usually using a tree algorithm. This means only a logarithmic amount of time steps have to be performed and is much more efficient than just collecting all data to the root and performing the operation there, especially for large amount of ranks.
So if you want the result at rank 0, you indeed have to run the code unconditionally like this:
MPI_Reduce(&num, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &product, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &max, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &min, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &bitwiseAnd, 1, MPI_INT, MPI_BAND, 0, MPI_COMM_WORLD);
If you need the result at different ranks, you can change the root
parameter accordingly. If you want the result to be available at all ranks, use MPI_Allreduce
instead.
Upvotes: 2
Reputation: 140445
I figured out what's wrong with your program by experimentation, and based on that, I have a hypothesis as to why it's wrong.
This modified version of your program does what you expected it to do:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <mpi.h>
int main (int argc, char **argv)
{
int my_id;
int num_procs;
int num;
int sum = 0;
int product = 0;
int max = 0;
int min = 0;
int bitwiseAnd = 0;
int seed = time(0);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
srand(seed * my_id);
num = rand() % 100;
printf("Process #%i: Here is num: %i\n",my_id,num);
MPI_Reduce(&num, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &product, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &max, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &min, 1, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
MPI_Reduce(&num, &bitwiseAnd, 1, MPI_INT, MPI_BAND, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if (my_id == 0) {
printf("The sum is %i\n", sum);
printf("The product is %i\n", product);
printf("The max is %i\n", max);
printf("The min is %i\n", min);
printf("The bitwiseAnd is %i\n", bitwiseAnd);
}
MPI_Finalize();
return 0;
}
Many of the changes I made are just cosmetic. The change that makes the difference is, all processes must execute all of the MPI_Reduce
calls in order for all of the results to be computed.
Now, why does that matter? I must emphasize that this is a hypothesis. I do not know. But an explanation that fits the available facts is: in both my and your implementation of MPI, the actual computation in an MPI_Reduce
call happens only on the root process, but all the other processes must also call MPI_Reduce in order to send a message with their values. That message doesn't depend on the operation argument. So the MPI_SUM call did what it was supposed to do by accident, because the other calls to MPI_Reduce provided the values it needed. But none of the other calls did any computation at all.
If my hypothesis is correct, you're going to need to structure your program quite a bit differently if you want to have each computation carried out in a different process. Abstractly, you want an all-to-all broadcast so that all processes have all the numbers, then local computation of sum, product, etc., then all-to-one send the values back to the root. If I'm reading http://mpitutorial.com/tutorials/mpi-scatter-gather-and-allgather/#mpi_allgather-and-modification-of-average-program correctly, MPI_Allgather
is the name of the function that does all-to-all broadcasts.
Upvotes: 2