TonyW
TonyW

Reputation: 18895

Barrier call stuck in Open MPI (C program)

I am practicing synchronization through barrier by using Open MPI message communication. I have created an array of struct called containers. Each container is linked to its neighbor on the right, and the two elements at both ends are also linked, forming a circle.

In the main() testing client, I run MPI with multiple processes (mpiexec -n 5 ./a.out), and they are supposed to be synchronized by calling the barrier() function, however, my code is stuck at the last process. I am looking for help with the debugging. Please see my code below:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>

typedef struct container {
    int labels;                  
    struct container *linked_to_container;    
    int sense;
} container;

container *allcontainers;   /* an array for all containers */
int size_containers_array;

int get_next_container_id(int current_container_index, int max_index)
{
    if (max_index - current_container_index >= 1)
    {
        return current_container_index + 1;
    }
    else 
        return 0;        /* elements at two ends are linked */
}

container *get_container(int index)
{
    return &allcontainers[index];
}


void container_init(int num_containers)
{
    allcontainers = (container *) malloc(num_containers * sizeof(container));  /* is this right to malloc memory on the array of container when the struct size is still unknown?*/
    size_containers_array = num_containers;

    int i;
    for (i = 0; i < num_containers; i++)
    {
        container *current_container = get_container(i);
        current_container->labels = 0;
        int next_container_id = get_next_container_id(i, num_containers - 1);     /* max index in all_containers[] is num_containers-1 */
        current_container->linked_to_container = get_container(next_container_id);
        current_container->sense = 0;   
    }
}

void container_barrier()
{
    int current_container_id, my_sense = 1;
    int tag = current_container_id;
    MPI_Request request[size_containers_array];
    MPI_Status status[size_containers_array];

    MPI_Comm_rank(MPI_COMM_WORLD, &current_container_id);
    container *current_container = get_container(current_container_id);

    int next_container_id = get_next_container_id(current_container_id, size_containers_array - 1);

    /* send asynchronous message to the next container, wait, then do blocking receive */
    MPI_Isend(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, &request[current_container_id]);
    MPI_Wait(&request[current_container_id], &status[current_container_id]);
    MPI_Recv(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

}

void free_containers()
{
    free(allcontainers);
}

int main(int argc, char **argv)
{
    int my_id, num_processes;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &num_processes);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_id);

    container_init(num_processes);

    printf("Hello world from thread %d of %d \n", my_id, num_processes);
    container_barrier();
    printf("passed barrier \n");



    MPI_Finalize();
    free_containers();

    return 0;
}

Upvotes: 0

Views: 777

Answers (1)

Wesley Bland
Wesley Bland

Reputation: 9082

The problem is the series of calls:

MPI_Isend()
MPI_Wait()
MPI_Recv()

This is a common source of confusion. When you use a "nonblocking" call in MPI, you are essentially telling the MPI library that you want to do some operation (send) with some data (my_sense). MPI gives you back an MPI_Request object with the guarantee that the call will be finished by the time a completion function finishes that MPI_Request.

The problem you have here is that you're calling MPI_Isend and immediately calling MPI_Wait before ever calling MPI_Recv on any rank. This means that all of those send calls get queued up but never actually have anywhere to go because you've never told MPI where to put the data by calling MPI_Recv (which tells MPI that you want to put the data in my_sense).

The reason this works part of the time is that MPI expects that things might not always sync up perfectly. If you smaller messages (which you do), MPI reserves some buffer space and will let your MPI_Send operations complete and the data gets stashed in that temporary space for a while until you call MPI_Recv later to tell MPI where to move the data. Eventually though, this won't work anymore. The buffers will be full and you'll need to actually start receiving your messages. For you, this means that you need to switch the order of your operations. Instead of doing a non-blocking send, you should do a non-blocking receive first, then do your blocking send, then wait for your receive to finish:

MPI_Irecv()
MPI_Send()
MPI_Wait()

The other option is to turn both functions into nonblocking functions and use MPI_Waitall instead:

MPI_Isend()
MPI_Irecv()
MPI_Waitall()

This last option is usually the best. The only thing that you'll need to be careful about is that you don't overwrite your own data. Right now you're using the same buffer for both the send and receive operations. If both of these are happening at the same time, there's no guarantees about the ordering. Normally this doesn't make a difference. Whether you send the message first or receive it doesn't really matter. However, in this case it does. If you receive data first, you'll end up sending the same data back out again instead of sending the data you had before the receive operation. You can solve this by using a temporary buffer to stage your data and move it to the right place when it's safe.

Upvotes: 1

Related Questions