Robet
Robet

Reputation: 79

MPI - Bad communication between processes

I'm trying to pass a frequency vector between processes and to update it in the process. The processes communicate in a tree topology:

0: 1 2
1: 0 3 4 5 6
2: 0 7 8
3: 1
4: 1 9 10
5: 1
6: 1
7: 2
8: 2 11
9: 4
10: 4
11: 8

Basically rank 0 can communicate only with 1 and 2, 1 only with 0,3,4,5,6 and so on. In the end Rank 0 should have a frequency vector with all the values from other ranks.

if (rank == 0) {
        for (i = 0; i < nr_elements; i++) {
            MPI_Recv(local_frequency, num_alphabets, MPI_INT, neigh[i], 0, MPI_COMM_WORLD, &status);
            printf("[RANK %d]Received from %d\n", rank, neigh[i]);
            for(i = 0; i < num_alphabets; i++) {
                frequency[i]+=local_frequency[i];
            }   
        }

    }

    else {
        //leaf
        if (nr_elements == 1) {
            MPI_Send(frequency, num_alphabets, MPI_INT, parent, 0, MPI_COMM_WORLD);
            printf("[RANK %d]Sent to %d\n", rank, parent);
        }
        else {
            //first we receive
            for (i = 0; i < nr_elements; i++) {
                if (neigh[i] != parent) {
                    MPI_Recv(local_frequency, num_alphabets, MPI_INT, neigh[i], 0, MPI_COMM_WORLD, &status); 
                    printf("[RANK %d]Received from %d\n", rank, neigh[i]);
                    for(i = 0; i < num_alphabets; i++) {
                       frequency[i]+=local_frequency[i];
                    } 
                }
            }
            MPI_Send(frequency, num_alphabets, MPI_INT, parent, 0, MPI_COMM_WORLD);
            printf("[RANK %d]Sent to %d\n", rank, parent);
        }

This is the result of their communication:

 - [RANK 2]Received from 7                                              
 - [RANK 2]Sent to 0                                                    
 - [RANK 3]Sent to 1                                                    
 - [RANK 6]Sent to 1                                                    
 - [RANK 7]Sent to 2                                                    
 - [RANK 4]Received from 9                                              
 - [RANK 4]Sent to 1                                                    
 - [RANK 5]Sent to 1                                                    
 - [RANK 9]Sent to 4                                                    
 - [RANK 0]Received from 1                                              
 - [RANK 1]Received from 3                                              
 - [RANK 1]Sent to 0                                                    
 - [RANK 10]Sent to 4                                                   
 - [RANK 11]Sent to 8                                                   
 - [RANK 8]Received from 11                                             
 - [RANK 8]Sent to 2

Every child send info to their parent, but apparently not all messages are received. However if I remove the update operation after every MPI_Recv everything works normally. Is there any problem with synchronization? What should I do?

Some things you should know:
   - num_alphabets = 256
   - parent and nr_elements are well calculated
   - neigh is the neighbours vector

Upvotes: 0

Views: 170

Answers (1)

Richard
Richard

Reputation: 61479

Debugging

Compiling with -g and running in a debugger might help you figure out where the problem is. To do so, you can launch your MPI program as follows:

mpirun -n 4 xterm -hold -e gdb -ex run --args ./program [arg1] [arg2] [...]

This will open one terminal window per process, allowing you to inspect the memory and stack of each process independently.

Blocking Send/Recvs

Since both MPI_Recv and MPI_Send are blocking, you can easily wind up in a situation where two process are sending when one should be receiving from the other. You can read about dining philosophers and their ilk to get a better handle on situations like this. I'd also recommend adding to your debugging output messages indicating when your processes are trying to send and trying to receive. You'll likely find a pair that are both trying to receive or both trying to send when they should be sending/receiving from each other.

Non-blocking Communication

A fix for the above is to use MPI's non-blocking send/receive commands: MPI_Isend and MPI_Irecv. These eliminate the race conditions you'll find above and are also handy in situations where a process can do work while waiting for results from another process. This necessarily happens in your case since you have a tree and cannot be sure which child will return its results first.

Upvotes: 1

Related Questions