Patrick.SE
Patrick.SE

Reputation: 4564

Gathering small matrix portions from different processes with MPI

I've spent some time thinking about a scheme to compute a matrix, everything makes sense, but there's one last part that I'm not sure how to handle.

Here's what I intend to do(scenario) :

  1. Asked to compute a 10(w)x5(h) matrix.
  2. I have 10 processors available.
  3. Declare a 1x5 matrix on each individual processor including proc of rank 0.
  4. Compute with offsets each sub-matrix on each processor.
  5. MPI_Barrier to wait for all 10 processors to finish computing.
  6. Display the full matrix.

    All the way to step five I'm good to go, but I don't know what to do past the barrier. None of the processors have the complete 10x5 matrix. In the beginning I figured I wouldn't need to have one, I wanted something like this :

      foreach(procX in proc(0-9))
          showColumn(procX)
    

But I don't know which processor will be called after MPI_Barrier and I don't know how to printf things with the order of the processors in mind(otherwise the matrix will not be printed correctly).

Does anyone have an idea on how to deal with this normally? I've read a lot about letting each processor work on parts of the matrix, but I couldn't find anything on how to have those different parts combined.

I'm not using scatters in my code(i.e., not using the master-slave technique)

Thanks

Upvotes: 0

Views: 203

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74355

While it is possible to have each rank print its own local portion and synchronise as in the answer given by suszterpatt, it relies on features that are not part of the MPI standard and therefore are not guaranteed to always work and only recommended for diagnostic use.

First, there are machines and their corresponding MPI implementations that do not allow any rank other than rank 0 to send to its standard output. Second, even when the standard output of all ranks gets redirected to the mpiexec process, it is usually buffered. Therefore it is possible that the standard output from rank i+1 might get displayed before the output from rank i even when the order of the prints is enforced using barriers. In other words, MPI does not provide an explicit flush operation (e.g. something like fflush(stdout)) that could force the local standard output buffer to be sent to mpiexec and displayed by the latter. It is up to the implementations to decide how to handle it.

Much more standard compliant way to do it would be to have each rank send its part of the matrix to rank 0, which in turn prints:

if (rank == 0)
{
   allocate temp_buffer;
   for (i = 1; i < num_procs; i++)
   {
       receive from rank i into temp_buffer
       print temp_buffer
   }
}
else
{
   send local matrix part to rank 0
}

That way no explicit barrier synchronisation is used and since only one process outputs the order of the lines displayed is guaranteed.

Another option is to have each rank print to a local string buffer - e.g. using sprintf - and then send the buffer to rank 0 and have the latter display the received string.

Upvotes: 2

suszterpatt
suszterpatt

Reputation: 8273

If you don't want to gather the entire matrix on a single process, you can use a loop like this:

for ( i = 0; i < 9; i++ )
{
    if ( i == myRank )
    {
        // print local matrix
    }
    MPI_Barrier(MPI_COMM_WORLD);
}

Upvotes: 1

Related Questions