PhilipTsv
PhilipTsv

Reputation: 13

Problem creating an MPI struct, error 11 when calling MPI_Bcast

I want to transport a struct between processes and for that I am trying to create a MPI struct. The code is for an Ant Colony Optimization (ACO) Algorithm.

The header file with he C struct contains:

    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/time.h>
    #include <math.h>
    #include <mpi.h>

    /* Constants */
    #define NUM_CITIES 100      // Number of cities
    //among others

    typedef struct {
        int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
        double tour_distance;
    } ACO_Ant;

I tried to build my code as suggested in this thread.

Program code:

    int main(int argc, char *argv[])
    {
    MPI_Datatype MPI_TABU, MPI_PATH, MPI_ANT;

    // Initialize MPI
    MPI_Init(&argc, &argv);
    //Determines the size (&procs) of the group associated with a communicator (MPI_COMM_WORLD)
    MPI_Comm_size(MPI_COMM_WORLD, &procs);
    //Determines the rank (&rank) of the calling process in the communicator (MPI_COMM_WORLD)
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_TABU);
    MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_PATH);
    MPI_Type_commit(&MPI_TABU);
    MPI_Type_commit(&MPI_PATH);

    // Create ant struct
    //int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
    //double tour_distance;
    int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
    MPI_Datatype    types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
    MPI_Aint        offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};

    MPI_Datatype tmp_type;
    MPI_Aint lb, extent;

    MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
    MPI_Type_get_extent( tmp_type, &lb, &extent );
    //Tried all of these
    MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
    //MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
    //MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
    MPI_Type_commit(&MPI_ANT);

    printf("Return: %d\n" , MPI_Bcast(ant, NUM_ANTS, MPI_ANT, 0, MPI_COMM_WORLD));
    }

But once the program reaches the MPI_Bcast command, it crashes with Error Code 11, which I presume is MPI_ERR_TOPOLOGY as per this manual. is a segfault (signal 11).

I am also unsure about some of the code why the author of the original program - Can some one explain why they create

MPI_Aint displacements[3];
MPI_Datatype typelist[3];

of size 3, when the struct has 2 variables?

int block_lengths[2];

Code:

    void ACO_Build_best(ACO_Best_tour *tour, MPI_Datatype *mpi_type /*out*/)
    {
        int block_lengths[2];
        MPI_Aint displacements[3];
        MPI_Datatype typelist[3];
        MPI_Aint start_address;
        MPI_Aint address;

        block_lengths[0] = 1;
        block_lengths[1] = NUM_CITIES;

        typelist[0] = MPI_DOUBLE;
        typelist[1] = MPI_INT;

        displacements[0] = 0;

        MPI_Address(&(tour->distance), &start_address);
        MPI_Address(tour->path, &address);
        displacements[1] = address - start_address;

        MPI_Type_struct(2, block_lengths, displacements, typelist, mpi_type);
        MPI_Type_commit(mpi_type);
    }

All and any help will be appreciated.
Edit: help with solving the problem, not marginally useful StackOverflow jargon

Upvotes: 1

Views: 143

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74355

This part is wrong:

int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype    types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint        offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};

The MPI_TABU and MPI_PATH datatypes already cover NUM_CITIES elements. When you specify the corresponding block size to also be NUM_CITIES, the resultant datatype will try to access NUM_CITIES * NUM_CITIES elements, likely resulting in a segfault (signal 11).

Either set all elements of blocklengths to 1 or replace MPI_TABU and MPI_PATH in the types array with MPI_INT.

This part is also wrong:

MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);

Calling MPI_Type_create_resized with the values returned by MPI_Type_get_extent is meaningless since it just duplicates the type without actually resizing it. Using sizeof(MPI_ANT) is wrong since MPI_ANT is not a C type but an MPI handle, which is either an integer index or a pointer (implementation-dependent). It will work with sizeof(ant) if ant is of type ACO_Ant, but given you call MPI_Bcast(ant, NUM_ANTS, ...), then ant is either a pointer, in which case sizeof(ant) is just the pointer size, or it is an array, in which case sizeof(ant) is NUM_ANTS times larger than it must be. The correct call is:

MPI_Type_create_resized(tmp_type, 0, sizeof(ACO_Ant), &ant_type);
MPI_Type_commit(&ant_type);

And please, never use MPI_ as prefix in your own variable or function names. This makes the code unreadable and is very misleading ("is that a predefined MPI datatype or a user-defined one?")

As for the last question, the author might have had a different structure in mind. Nothing stops you from using larger arrays as long as you call MPI_Type_create with the correct number of significant elements.

Note: You don't have to commit MPI datatypes that are never used directly in communication calls. I.e., those two lines are unnecessary:

MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);

Upvotes: 1

Related Questions