Reputation: 13
I want to transport a struct between processes and for that I am trying to create a MPI struct. The code is for an Ant Colony Optimization (ACO) Algorithm.
The header file with he C struct contains:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h>
#include <mpi.h>
/* Constants */
#define NUM_CITIES 100 // Number of cities
//among others
typedef struct {
int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
double tour_distance;
} ACO_Ant;
I tried to build my code as suggested in this thread.
Program code:
int main(int argc, char *argv[])
{
MPI_Datatype MPI_TABU, MPI_PATH, MPI_ANT;
// Initialize MPI
MPI_Init(&argc, &argv);
//Determines the size (&procs) of the group associated with a communicator (MPI_COMM_WORLD)
MPI_Comm_size(MPI_COMM_WORLD, &procs);
//Determines the rank (&rank) of the calling process in the communicator (MPI_COMM_WORLD)
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_TABU);
MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_PATH);
MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
// Create ant struct
//int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
//double tour_distance;
int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};
MPI_Datatype tmp_type;
MPI_Aint lb, extent;
MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);
printf("Return: %d\n" , MPI_Bcast(ant, NUM_ANTS, MPI_ANT, 0, MPI_COMM_WORLD));
}
But once the program reaches the MPI_Bcast command, it crashes with Error Code 11, which I presume is MPI_ERR_TOPOLOGY as per this manual. is a segfault (signal 11).
I am also unsure about some of the code why the author of the original program - Can some one explain why they create
MPI_Aint displacements[3];
MPI_Datatype typelist[3];
of size 3, when the struct has 2 variables?
int block_lengths[2];
Code:
void ACO_Build_best(ACO_Best_tour *tour, MPI_Datatype *mpi_type /*out*/)
{
int block_lengths[2];
MPI_Aint displacements[3];
MPI_Datatype typelist[3];
MPI_Aint start_address;
MPI_Aint address;
block_lengths[0] = 1;
block_lengths[1] = NUM_CITIES;
typelist[0] = MPI_DOUBLE;
typelist[1] = MPI_INT;
displacements[0] = 0;
MPI_Address(&(tour->distance), &start_address);
MPI_Address(tour->path, &address);
displacements[1] = address - start_address;
MPI_Type_struct(2, block_lengths, displacements, typelist, mpi_type);
MPI_Type_commit(mpi_type);
}
All and any help will be appreciated.
Edit: help with solving the problem, not marginally useful StackOverflow jargon
Upvotes: 1
Views: 143
Reputation: 74355
This part is wrong:
int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};
The MPI_TABU
and MPI_PATH
datatypes already cover NUM_CITIES
elements. When you specify the corresponding block size to also be NUM_CITIES
, the resultant datatype will try to access NUM_CITIES * NUM_CITIES
elements, likely resulting in a segfault (signal 11).
Either set all elements of blocklengths
to 1
or replace MPI_TABU
and MPI_PATH
in the types
array with MPI_INT
.
This part is also wrong:
MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);
Calling MPI_Type_create_resized
with the values returned by MPI_Type_get_extent
is meaningless since it just duplicates the type without actually resizing it. Using sizeof(MPI_ANT)
is wrong since MPI_ANT
is not a C type but an MPI handle, which is either an integer index or a pointer (implementation-dependent). It will work with sizeof(ant)
if ant
is of type ACO_Ant
, but given you call MPI_Bcast(ant, NUM_ANTS, ...)
, then ant
is either a pointer, in which case sizeof(ant)
is just the pointer size, or it is an array, in which case sizeof(ant)
is NUM_ANTS
times larger than it must be. The correct call is:
MPI_Type_create_resized(tmp_type, 0, sizeof(ACO_Ant), &ant_type);
MPI_Type_commit(&ant_type);
And please, never use MPI_
as prefix in your own variable or function names. This makes the code unreadable and is very misleading ("is that a predefined MPI datatype or a user-defined one?")
As for the last question, the author might have had a different structure in mind. Nothing stops you from using larger arrays as long as you call MPI_Type_create
with the correct number of significant elements.
Note: You don't have to commit MPI datatypes that are never used directly in communication calls. I.e., those two lines are unnecessary:
MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
Upvotes: 1