suifengls
suifengls

Reputation: 31

MPI several broadcast at the same time

I have a 2D processor grid (3*3):

P00, P01, P02 are in R0, P10, P11, P12, are in R1, P20, P21, P22 are in R2. P*0 are in the same computer. So the same to P*1 and P*2.

Now I would like to let R0, R1, R2 call MPI_Bcast at the same time to broadcast from P*0 to p*1 and P*2.

I find that when I use MPI_Bcast, it takes three times the time I need to broadcast in only one row.

For example, if I only call MPI_Bcast in R0, it takes 1.00 s. But if I call three MPI_Bcast in all R[0, 1, 2], it takes 3.00 s in total. It means the MPI_Bcast cannot work parallel.

Is there any methods to make the MPI_Bcast broadcast at the same time? (ONE node broadcast with three channels at the same time.)

Thanks.

Upvotes: 1

Views: 2851

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74475

If I understand your question right, you would like to have simultaneous row-wise broadcasts:

P00 -> P01 & P02
P10 -> P11 & P12
P20 -> P21 & P22

This could be done using subcommunicators, e.g. one that only has processes from row 0 in it, another one that only has processes from row 1 in it and so on. Then you can issue simultaneous broadcasts in each subcommunicator by calling MPI_Bcast with the appropriate communicator argument.

Creating row-wise subcommunicators is extreamly easy if you use Cartesian communicator in first place. MPI provides the MPI_CART_SUB operation for that. It works like that:

// Create a 3x3 non-periodic Cartesian communicator from MPI_COMM_WORLD
int dims[2] = { 3, 3 };
int periods[2] = { 0, 0 };
MPI_Comm comm_cart;

// We do not want MPI to reorder our processes
// That's why we set reorder = 0
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm_cart);

// Split the Cartesian communicator row-wise
int remaindims[2] = { 0, 1 };
MPI_Comm comm_row;

MPI_Cart_sub(comm_cart, remaindims, &comm_row);

Now comm_row will contain handle to a new subcommunicator that will only span the same row that the calling process is in. It only takes a single call to MPI_Bcast now to perform three simultaneous row-wise broadcasts:

MPI_Bcast(&data, data_count, MPI_DATATYPE, 0, comm_row);

This works because comm_row as returned by MPI_Cart_sub will be different in processes located at different rows. 0 here is the rank of the first process in comm_row subcommunicator which will correspond to P*0 because of the way the topology was constructed.

If you do not use Cartesian communicator but operate on MPI_COMM_WORLD instead, you can use MPI_COMM_SPLIT to split the world communicator into three row-wise subcommunicators. MPI_COMM_SPLIT takes a color that is used to group processes into new subcommunicators - processes with the same color end up in the same subcommunicator. In your case color should equal to the number of the row that the calling process is in. The splitting operation also takes a key that is used to order processes in the new subcommunicator. It should equal the number of the column that the calling process is in, e.g.:

// Compute grid coordinates based on the rank
int proc_row = rank / 3;
int proc_col = rank % 3;
MPI_Comm comm_row;

MPI_Comm_split(MPI_COMM_WORLD, proc_row, proc_col, &comm_row);

Once again comm_row will contain the handle of a subcommunicator that only spans the same row as the calling process.

Upvotes: 5

Greg Inozemtsev
Greg Inozemtsev

Reputation: 4671

The MPI-3.0 draft includes a non-blocking MPI_Ibcast collective. While the non-blocking collectives aren't officially part of the standard yet, they are already available in MPICH2 and (I think) in OpenMPI.

Alternatively, you could start the blocking MPI_Bcast calls from separate threads (I'm assuming R0, R1 and R2 are different communicators).

A third possibility (which may or may not be possible) is to restructure the data so that only one broadcast is needed.

Upvotes: 1

Related Questions