MPI Scatter losing values from the final partition

Question

I have an array of numbers that I need to scatter to each node in an MPI program. The setup is that I have an array of numbers from 1 to 100 with all the even numbers except the number 2 removed. Due to the way I removed the even numbers, the number 2 is the last element in the array.

So my array contains 51 odd numbers, 3, 5, 7, ... 99, 2. My problem is that the final partition after a scatter does not contain the last three numbers in the array - 97, 99 and 2.

int *oddsOnly  = //array as setup above, 3,5,7,...99,2
int chunkSize = (oddsFound / worldSize);
int *localPartition = new int[chunkSize];

// Send everyone the chunk size
MPI_Bcast(&chunkSize, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Scatter(oddsOnly, chunkSize, MPI_INT, localPartition, chunkSize, MPI_INT, 0, MPI_COMM_WORLD);

I understand the issue is that the number of ranks and the array size don't divide evenly, I've tried

chunkSize = ceil(oddsFound / worldSize);

And

chunkSize = (oddsFound / worldSize) + 1;

But this then gives me duplicates in the split.

As it stands I get

0 scatter is 1  3   5   7   9   11  13  15  17  19  21  23

1 scatter is 25 27  29  31  33  35  37  39  41  43  45  47  

2 scatter is 49 51  53  55  57  59  61  63  65  67  69  71

3 scatter is 73 75  77  79  81  83  85  87  89  91  93  95

Is it possible to do what I'm attemptimg tidily? I had a look at scatterV but I'm not sure its what I need. I should add I don't have to scatter, so maybe there is a better MPI way of doing it.

dreamcrash · Accepted Answer

One approach is to use MPI_Scatter and add padding to the array so that its size is evenly divisible among processes. However, whenever a process has to manipulate that array it needs to ignore the padding part.

Another approach is to use MPI_ScatterV, and "manually" divide the blocks among the processes.

An example of such approach:

#include 
#include 
#include 
#include 

int main(int argc,char *argv[]){
    MPI_Init(NULL,NULL); // Initialize the MPI environment
    int world_rank; 
    int world_size;
    MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
    MPI_Comm_size(MPI_COMM_WORLD,&world_size);

    int size = 50;
    int *numbers= NULL;
    if(world_rank == 0){
      numbers = malloc(sizeof(int) * size);
      for(int i = 0; i < size; i++)
         numbers[i] = (2*i + 1); 
    }
    // The Important Part
    int sendcounts [world_size];
    int displs [world_size]; 
    int res = size % world_size;
    int size_per_process = size / world_size;
    int increment = 0;
    for(int processID = 0; processID < world_size; processID++){
       displs[processID] = increment;
       sendcounts[processID] = (processID + 1 <= res) ? size_per_process + 1 : size_per_process;
       increment += sendcounts[processID];
    }
    int process_size = sendcounts[world_rank];
    int local_numbers[process_size];
    MPI_Scatterv(numbers, sendcounts, displs, MPI_INT, local_numbers, process_size, MPI_INT, 0, MPI_COMM_WORLD);  
    if(world_rank == world_size - 1){
       for(int i = 0; i < size_per_process; i++)
      printf("%d ", local_numbers[i]); 
       printf("
"); 
    }

    MPI_Finalize();
    return 0;
 }

It is a bit convoluted, but the good news is that it is always the same for this type of distribution.

Explanation of the code:

First we need to understand the parameters of the MPI_Scatterv:

MPI_Scatterv

Scatters a buffer in parts to all processes in a communicator

int MPI_Scatterv(const void *sendbuf, const int *sendcounts, const int *displs, MPI_Datatype sendtype, void *recvbuf, int recvcount,
          MPI_Datatype recvtype, int root, MPI_Comm comm)

Input Parameters

sendbuf address of send buffer (choice, significant only at root)
sendcounts integer array (of length group size) specifying the number of elements to send to each processor
displs integer array (of length group size). Entry i specifies the displacement (relative to sendbuf from which to take the outgoing data to process i
sendtype data type of send buffer elements (handle)
recvcount number of elements in receive buffer (integer)
recvtype data type of receive buffer elements (handle)
root rank of sending process (integer)
comm communicator (handle)

First, we create the array with the number of elements that each process will send:

int sendcounts [world_size];

then we create the array with the displacements:

int displs [world_size];

then we calculate the number of extra iterations:

int res = size % world_size;

then we calculate the number of iterations that each process are guarantee to have:

int size_per_process = size / world_size;

Finally, we calculate the size and the displacement of each process:

int increment = 0;
for(int processID = 0; processID < world_size; processID++){
   displs[processID] = increment;
   sendcounts[processID] = (processID + 1 <= res) ? size_per_process + 1 : size_per_process;
   increment += sendcounts[processID];
}

the line:

   sendcounts[processID] = (processID + 1 <= res) ? size_per_process + 1 : size_per_process;

means that we will assign the extra iterations from left to right (i.e., from process rank 0 to rank total of processes - 1).

MPI Scatter losing values from the final partition

Answers (1)

Related Questions