Raspberry
Raspberry

Reputation: 337

Passing large 2d dimentional array in MPI C++

I have a task to speed up a program using MPI. Let's assume I have a large 2d array (1000x1000 or bigger) on the input. I have a working sequential program that divides, so the 2d array into chunks (for example 10x10) and calculates the result which is double for each chuck. (so we have a function which argument is 2d array 10x10 and a result is a double number).

My first idea to speed up:

  1. Create 1d array of size N*N (for example 10x10 = 100) and Send array to another process
double* buffer = new double[dataPortionSize];
//copy some data to buffer
MPI_Send(buffer, dataPortionSize, MPI_DOUBLE, currentProcess, 1, MPI_COMM_WORLD);
  1. Recieve it in another process, calculate result, send back the result
double* buf = new double[dataPortionSize];
MPI_Recv(buf, dataPortionSize, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, status);
double result = function->calc(buf);
MPI_Send(&result, 1, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD);

This program was much slower than the sequential version. It looks like MPI needs a lot of time to pass an array to another process.

My second idea:

  1. Pass the whole 2d input array to all processes
// data is protected field in base class, it is injected during runtime 
MPI_Send(&(data[0][0]), dataSize * dataSize, MPI_DOUBLE, currentProcess, 1, MPI_COMM_WORLD);
  1. And receive data like this
double **arrayAlloc( int size ) {
 double **result; result = new double [ size ];
 for ( int i = 0; i < size; i++ )
 result[ i ] = new double[ size ];
return result;
}

double **data = arrayAlloc(dataSize);
MPI_Recv(&data[0][0], dataSize * dataSize, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, status);

Unfortunately, I got a bunch of errors during execution: enter image description here

enter image description here

Those crashes are pretty random. It happened 2 times that the program ended successfully

My third idea:

Pass memory address to all processes, but I found this:

MPI processes cannot read each others' memory, and virtual addressing makes one process' pointer completely meaningless to another.

Does anyone have an idea how to speed up it? I understand that the key thing for speed to is pass array/arrays to processes in an efficient way, but I don't have an idea how to do this.

Upvotes: 0

Views: 455

Answers (1)

Homer512
Homer512

Reputation: 13295

You have multiple issues here. I'll try to go through them in some arbitrary order.

  1. As someone else explained, your second attempt fails because MPI expects you to work with a single consecutive array, not an array of pointers. So you want to allocate something like matrix = new double[rows * cols] and then access individual rows as &matrix[row * cols] or an individual value as matrix[row * cols + col]

This would be a data structure that you can send, receive, scatter, and gather with MPI. It would also be faster in general.

  1. You are correct to assume that MPI takes time to transfer data. Even best case it is the cost of a memcpy. Usually significantly more. If your program is doing too little work before transferring data, it will not be faster.

  2. Your first attempt may have failed because the first process doesn't do anything useful while waiting for the result. You didn't include the receive operation in your code sample. However, if you wrote something like this:

for(int block = 0; block < nblocks; ++block) {
  generate_data(buf);
  MPI_Send(buf, ...);
  MPI_Recv(buf, ...);
}

Then you cannot expect a speedup because the process is not doing anything useful while waiting for the result. You can avoid this with double buffering. Let the first process generate the next data block before waiting in the receive operation for the result. Something like this:

generate_data(0, input); /* 0-th block */
MPI_Send(input, ...);
for(int block = 1; block < nblocks; ++block) {
  generate_data(block, input); /* 1st up to nth block */
  MPI_Recv(output, ...); /* 0-th up to n-1-th block */
  MPI_Send(input, ...);
}
MPI_Recv(output, ...); /* n-th block */

Now calculations in both processes can overlap.

  1. You shouldn't use MPI_Send and MPI_Recv to begin with! MPI is designed for collective operations like MPI_Scatter and MPI_Gather. What you should do, is generate N blocks for N processes, MPI_Scatter them across all processes. Then let each process compute their result. Then MPI_Gather them back at the root process.

  2. Even better, let every process work independently, if possible. Of course this depends on your data but if you can generate and process data blocks independently from one another, don't do any communication. Just let them all work alone. Something like this:

int rank, worldsize;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &worldsize); 
for(int block = rank; block < nblocks; block += worldsize) {
    process_data(block);
}

Upvotes: 2

Related Questions