Reputation: 4564
I've spent some time thinking about a scheme to compute a matrix, everything makes sense, but there's one last part that I'm not sure how to handle.
Here's what I intend to do(scenario) :
Display the full matrix.
All the way to step five I'm good to go, but I don't know what to do past the barrier. None of the processors have the complete 10x5 matrix. In the beginning I figured I wouldn't need to have one, I wanted something like this :
foreach(procX in proc(0-9))
showColumn(procX)
But I don't know which processor will be called after MPI_Barrier and I don't know how to printf things with the order of the processors in mind(otherwise the matrix will not be printed correctly).
Does anyone have an idea on how to deal with this normally? I've read a lot about letting each processor work on parts of the matrix, but I couldn't find anything on how to have those different parts combined.
I'm not using scatters in my code(i.e., not using the master-slave technique)
Thanks
Upvotes: 0
Views: 203
Reputation: 74355
While it is possible to have each rank print its own local portion and synchronise as in the answer given by suszterpatt, it relies on features that are not part of the MPI standard and therefore are not guaranteed to always work and only recommended for diagnostic use.
First, there are machines and their corresponding MPI implementations that do not allow any rank other than rank 0 to send to its standard output. Second, even when the standard output of all ranks gets redirected to the mpiexec
process, it is usually buffered. Therefore it is possible that the standard output from rank i+1
might get displayed before the output from rank i
even when the order of the prints is enforced using barriers. In other words, MPI does not provide an explicit flush operation (e.g. something like fflush(stdout)
) that could force the local standard output buffer to be sent to mpiexec
and displayed by the latter. It is up to the implementations to decide how to handle it.
Much more standard compliant way to do it would be to have each rank send its part of the matrix to rank 0, which in turn prints:
if (rank == 0)
{
allocate temp_buffer;
for (i = 1; i < num_procs; i++)
{
receive from rank i into temp_buffer
print temp_buffer
}
}
else
{
send local matrix part to rank 0
}
That way no explicit barrier synchronisation is used and since only one process outputs the order of the lines displayed is guaranteed.
Another option is to have each rank print to a local string buffer - e.g. using sprintf
- and then send the buffer to rank 0 and have the latter display the received string.
Upvotes: 2
Reputation: 8273
If you don't want to gather the entire matrix on a single process, you can use a loop like this:
for ( i = 0; i < 9; i++ )
{
if ( i == myRank )
{
// print local matrix
}
MPI_Barrier(MPI_COMM_WORLD);
}
Upvotes: 1