Fuji San
Fuji San

Reputation: 135

MPI Send/Recv millions of messages

I have this loop over NT (millions of iterations) for procs greater than 0. Messages of 120 bytes are sent to proc 0 for each iteration and proc 0 receives them (I have the same loop over NT for proc 0).

I want proc 0 to receive them ordered so I can store them in array nhdr1.

The problem is that proc 0 does not receive messages properly and I have often 0 values in array nhdr.

How can I modify the code so that the messages are received in the same order are they were sent?

[...]
    if (rank == 0) {

        nhdr  = malloc((unsigned long)15*sizeof(*nhdr));
        nhdr1 = malloc((unsigned long)NN*15*sizeof(*nhdr1));

        itr = 0;
        jnode = 1;

        for (l=0; l<NT; l++) {

            MPI_Recv(nhdr, 15, MPI_LONG, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

            if (l == status.MPI_TAG) {
                for (i=0; i<nkeys; i++)
                    nhdr1[itr*15+i] = nhdr[i];
            }

            itr++;

            if (itr == NN) {
                ipos = (unsigned long)(jnode-1)*NN*15*sizeof(*nhdr1);

                fseek(ismfh, ipos, SEEK_SET);
                nwrite += fwrite(nhdr1, sizeof(*nhdr1), NN*15, ismfh);

                itr = 0;
                jnode++;
            }
        }

        free(nhdr);
        free(nhdr1);

    } else {

        nhdr = malloc(15*sizeof(*nhdr));

        irecmin = (rank-1)*NN+1;
        irecmax = rank*NN;

        for (l=0; l<NT; l++) {
            if (jrec[l] >= irecmin && jrec[l] <= irecmax) {

                indx1 = (unsigned long)(jrec[l]-irecmin) * 15;

                for (i=0; i<15; i++)
                    nhdr[i] = nhdr1[indx1+i]; // nhdr1 is allocated before for rank>0!

                MPI_Send(nhdr, 15, MPI_LONG, 0, l, MPI_COMM_WORLD);
            }
        }

        free(nhdr);

    }

Upvotes: 0

Views: 169

Answers (3)

Sigi
Sigi

Reputation: 4926

keeping in mind that:

  • messages from a given rank are received in order and
  • messages have the originating processor rank in the status structure (status.MPI_SOURCE) returned by MPI_Recv()

you can use these two elements to properly place the received data into nhdr1.

Upvotes: 0

Stan Graves
Stan Graves

Reputation: 6955

What do you mean by " messages are received in the same order are they were sent"?

In the code now, the message ARE received in (roughly) the order that they are actually sent...but that order has nothing to do with the rank numbers, or really anything else. See @Wesley Bland's response for more on this.

If you mean "receive the messages in rank order"...then there are a few options.

First, a collective like MPI_Gather or MPI_Gatherv would be an "obvious" choice to ensure that the data is ordered by the rank that produced it. This only works if each rank does the same number of iterations, and those iterations stay roughly sync'd.

Second, you could remove the MPI_ANY_SOURCE, and post a set of MPI_IRevc with the buffers supplied "in order". When a message arrives, it will be in the correct buffer location "automatically." For each message that is received, a new MPI_Irecv could be posted with the correct recv buffer location supplied. Any un-matched MPI_Irecv's would need to be canceled at the end of the job.

Upvotes: 1

Wesley Bland
Wesley Bland

Reputation: 9072

There is no way to guarantee that your messages will arrive on rank 0 in the same order they were sent from different ranks. For example, if you have a scenario like this (S1 means send message 1) :

rank 0 ----------------
rank 1 ---S1------S3---
rank 2 ------S2------S4

There is no guarantee that the messages will arrive at rank 0 in the order S1, S2, S3, S4. The only guarantee made by MPI is that the messages from each rank that are sent on the same communicator with the same tag (which you are doing) will arrive in the same order they were sent. This means that the resulting order could be:

S1, S2, S3, S4

Or it could be:

S1, S3, S2, S4

or:

S2, S1, S3, S4

...and so on.

For most applications, this doesn't really matter. The ordering that's important is the logical ordering, not the real time ordering. You might take another look at your application and make sure you can't relax your requirements a bit.

Upvotes: 4

Related Questions