user3658306
user3658306

Reputation: 217

parallelize for loop using boost MPI

I am learning to use Boost.MPI to parallelize the large amount of computation, here below is just my simple test see if I can get MPI logic correctly. However, I did not get it to work. I used world.size()=10, there are total 50 elements in data array, each process will do 5 iteration. I would hope to update data array by having each process sending the updated data array to root process, and then the root process receives the updated data array then print out. But I only get a few elements updated.

Thanks for helping me.

#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>

namespace mpi = boost::mpi;
using namespace std;

#define max_rows 100
int data[max_rows];

int modifyArr(const int index, const int arr[]) {
  return arr[index]*2+1;
}

int main(int argc, char* argv[])
{
  mpi::environment env(argc, argv);
  mpi::communicator world;

  int num_rows = 50;
  int my_number;

  if (world.rank() == 0) {
    for ( int i = 0; i < num_rows; i++)
        data[i] = i + 1;
  }

  broadcast(world, data, 0);

  for (int i = world.rank(); i < num_rows; i += world.size()) {
    my_number = modifyArr(i, data);
    data[i]   = my_number;

    world.send(0, 1, data);

    //cout << "i=" << i << " my_number=" << my_number << endl;

    if (world.rank() == 0)
      for (int j = 1; j < world.size(); j++) 
        mpi::status s = world.recv(boost::mpi::any_source, 1, data);
  }

  if (world.rank() == 0) {
    for ( int i = 0; i < num_rows; i++)
      cout << "i=" << i << " results = " << data[i] << endl;
  }

  return 0;
}

Upvotes: 0

Views: 425

Answers (1)

Richard
Richard

Reputation: 61289

Your problem is probably here:

mpi::status s = world.recv(boost::mpi::any_source, 1, data);

This is the only way data can get back to the master node.

However, you do not tell the master node where in data to store the answers it is getting. Since data is the address of the array, everything should get stored in the zeroth element.

Interleaving which elements of the array you are processing on each node is a pretty bad idea. You should assign blocks of the array to each node so that you can send entire chunks of the array at once. That will reduce communication overhead significantly.

Also, if your issue is simply speeding up for loops, you should consider OpenMP, which can do things like this:

#pragma omp parallel for
for(int i=0;i<100;i++)
  data[i]*=4;

Bam! I just split that for loop up between all of my processes with no further work needed.

Upvotes: 2

Related Questions