Reputation: 111
I'm implementing a distributed image(greyscale) convolution using MPI. My existing pattern is to read the image as a 1D flattened array at the root process and then scatter them to all the processes (row-decomposition) and then do a MPI_Gather
at the root process and then write the image out again as a 1D flattened array. Obviously, this doesn't give the expected results since with image convolution, the situation gets tricky at the boundaries.
So, to improve upon the aforementioned pattern, I want to implement the so called ghost cell exchange
pattern wherein the processes exchange their rows in the ghost rows.
In pseudocode:
if (rank == 0) {
src = null
dest = rank + 1
}
if (rank == size - 1) {
src = rank - 1
dest = null
} else {
src = rank - 1
dest = rank + 1
}
MPI_SendRecv(&sendbuf[offset], slen, dest..
&recvbuf[offset], rlen, src);
How do I allocate memory for the "ghost rows" on each process? Should I pre-allocate the memory and then scatter? I don't want to go for a "custom-datatype" solution since it's an overkill for the scope of the problem I'm considering.
Upvotes: 3
Views: 3316
Reputation: 22670
Ideally, the ghost cells should be part of the same memory block as your normal cells. That way, you can keep the addressing scheme simple. In that scheme, the image is distributed by multiples of complete rows, using MPI_Scatter
and MPI_Gather
. In a non-border rank you allocate enough memory for two additional ghost rows:
height = total_hight / ranks;
std::vector<float> data(width * (height + 2));
float* image = &data[width];
float* ghost_north = &data[0]
float* ghost_south = &data[width * (height + 1)]
float* inner_north = image;
float* inner_south = &image[width * (height - 1)]
MPI_Scatter(root_image, width * height, MPI_FLOAT,
image, width * height, MPI_FLOAT, ...);
...
iterations {
MPI_SendRecv(inner_north, width, MPI_FLOAT, north, tag,
ghost_north, width, MPI_FLOAT, north, tag, ...)
MPI_SendRecv(inner_south, width, MPI_FLOAT, south, tag,
ghost_south, width, MPI_FLOAT, south, tag, ...)
... compute ...
}
MPI_Gather(image, width * height, MPI_FLOAT,
root_image, width * height, MPI_FLOAT, ...);
This pseudocode does not consider special border cases.
The issue with the simple one-dimensional splitting, is that communication cost and additional halo data is non-optimal. Especially for smaller images and a larger number of participating ranks.
Here is an excellent example by Rolf Rabenseifner regarding data decompoisition and halo communication methods with MPI. He also explains how you can improve communication methods. For 2D decomposition, you will need derived MPI datatypes for both initial communication and vertical boundaries.
Upvotes: 3