Reputation: 41
I have a Finite Element code that uses blocking receives and non-blocking sends. Each element has 3 incoming faces and 3 outgoing faces. The mesh is split up among many processors, so sometimes the boundary conditions come from the elements processor, or from neighboring processors. Relevant parts of the code are:
std::vector<task>::iterator it = All_Tasks.begin();
std::vector<task>::iterator it_end = All_Tasks.end();
int task = 0;
for (; it != it_end; it++, task++)
{
for (int f = 0; f < 3; f++)
{
// Get the neighbors for each incoming face
Neighbor neighbor = subdomain.CellSets[(*it).cellset_id_loc].neighbors[incoming[f]];
// Get buffers from boundary conditions or neighbor processors
if (neighbor.processor == rank)
{
subdomain.Set_buffer_from_bc(incoming[f]);
}
else
{
// Get the flag from the corresponding send
target = GetTarget((*it).angle_id, (*it).group_id, (*it).cell_id);
if (incoming[f] == x)
{
int size = cells_y*cells_z*groups*angles*4;
MPI_Status status;
MPI_Recv(&subdomain.X_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &status);
}
if (incoming[f] == y)
{
int size = cells_x*cells_z*groups*angles * 4;
MPI_Status status;
MPI_Recv(&subdomain.Y_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &status);
}
if (incoming[f] == z)
{
int size = cells_x*cells_y*groups*angles * 4;
MPI_Status status;
MPI_Recv(&subdomain.Z_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &status);
}
}
}
... computation ...
for (int f = 0; f < 3; f++)
{
// Get the outgoing neighbors for each face
Neighbor neighbor = subdomain.CellSets[(*it).cellset_id_loc].neighbors[outgoing[f]];
if (neighbor.IsOnBoundary)
{
// store the buffer into the boundary information
}
else
{
target = GetTarget((*it).angle_id, (*it).group_id, neighbor.cell_id);
if (outgoing[f] == x)
{
int size = cells_y*cells_z*groups*angles * 4;
MPI_Request request;
MPI_Isend(&subdomain.X_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &request);
}
if (outgoing[f] == y)
{
int size = cells_x*cells_z*groups*angles * 4;
MPI_Request request;
MPI_Isend(&subdomain.Y_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &request);
}
if (outgoing[f] == z)
{
int size = cells_x*cells_y*groups*angles * 4;
MPI_Request request;
MPI_Isend(&subdomain.Z_buffer[0], size, MPI_DOUBLE, neighbor.processor, target, MPI_COMM_WORLD, &request);
}
}
}
}
A processor can do a lot of tasks before it needs information from other processors. I need a non-blocking send so that the code can keep working, but I'm pretty sure the receives are overwriting the send buffers before they get sent.
I've tried timing this code, and it's taking 5-6 seconds for the call to MPI_Recv, even though the message it's trying to receive has been sent. My theory is that the Isend is starting, but not actually sending anything until the Recv is called. The message itself is on the order of 1 MB. I've looked at benchmarks and messages of this size should take a very small fraction of a second to send.
My question is, in this code, is the buffer that was sent being overwritten, or just the local copy? Is there a way to 'add' to a buffer when I'm sending, rather than writing to the same memory location? I want the Isend to write to a different buffer every time it's called so the information isn't being overwritten while the messages wait to be received.
** EDIT ** A related question that might fix my problem: Can MPI_Test or MPI_Wait give information about an MPI_Isend writing to a buffer, i.e. return true if the Isend has written to the buffer, but that buffer has yet to be received?
** EDIT 2 ** I have added more information about my problem.
Upvotes: 0
Views: 688
Reputation: 41
So it looks like I just have to bite the bullet and allocate enough memory in the send buffers to accommodate all the messages, and then just send portions of the buffer when I send.
Upvotes: 0