Ankit Kulshrestha
Ankit Kulshrestha

Reputation: 111

Ghost cell exchange pattern in MPI

I'm implementing a distributed image(greyscale) convolution using MPI. My existing pattern is to read the image as a 1D flattened array at the root process and then scatter them to all the processes (row-decomposition) and then do a MPI_Gather at the root process and then write the image out again as a 1D flattened array. Obviously, this doesn't give the expected results since with image convolution, the situation gets tricky at the boundaries.

So, to improve upon the aforementioned pattern, I want to implement the so called ghost cell exchange pattern wherein the processes exchange their rows in the ghost rows. In pseudocode:

if (rank == 0) {
    src = null
    dest = rank + 1
}

if (rank == size - 1) {
    src = rank - 1
    dest = null
} else {
   src = rank - 1
   dest = rank + 1
}

MPI_SendRecv(&sendbuf[offset], slen, dest..
             &recvbuf[offset], rlen, src);

How do I allocate memory for the "ghost rows" on each process? Should I pre-allocate the memory and then scatter? I don't want to go for a "custom-datatype" solution since it's an overkill for the scope of the problem I'm considering.

Upvotes: 3

Views: 3316

Answers (1)

Zulan
Zulan

Reputation: 22670

Ideally, the ghost cells should be part of the same memory block as your normal cells. That way, you can keep the addressing scheme simple. In that scheme, the image is distributed by multiples of complete rows, using MPI_Scatter and MPI_Gather. In a non-border rank you allocate enough memory for two additional ghost rows:

height = total_hight / ranks;
std::vector<float> data(width * (height + 2));
float* image = &data[width];
float* ghost_north = &data[0]
float* ghost_south = &data[width * (height + 1)]
float* inner_north = image;
float* inner_south = &image[width * (height - 1)]
MPI_Scatter(root_image, width * height, MPI_FLOAT,
            image, width * height, MPI_FLOAT, ...);
...
iterations {
    MPI_SendRecv(inner_north, width, MPI_FLOAT, north, tag,
                 ghost_north, width, MPI_FLOAT, north, tag, ...)
    MPI_SendRecv(inner_south, width, MPI_FLOAT, south, tag,
                 ghost_south, width, MPI_FLOAT, south, tag, ...)
   ... compute ...
}
MPI_Gather(image, width * height, MPI_FLOAT,
           root_image, width * height, MPI_FLOAT, ...);

This pseudocode does not consider special border cases.

The issue with the simple one-dimensional splitting, is that communication cost and additional halo data is non-optimal. Especially for smaller images and a larger number of participating ranks.

Here is an excellent example by Rolf Rabenseifner regarding data decompoisition and halo communication methods with MPI. He also explains how you can improve communication methods. For 2D decomposition, you will need derived MPI datatypes for both initial communication and vertical boundaries.

Upvotes: 3

Related Questions