Reputation: 2163
I have some array of data. What I was trying to do is like this:
Use rank 0 to bcast data to 50 nodes. Each node has 1 mpi process on it with 16 cores available to that process. Then, each mpi process will call python multiprocessing. Some calculations are done, then the mpi process saves the data that was calculated with multiprocessing. The mpi process then changes some variable, and runs multiprocessing again. Etc.
So the nodes do not need to communicate with each other besides the initial startup in which they all receive some data.
The multiprocessing is not working out so well. So now I want to use all MPI.
How can I (or is it not possible) use an array of integers that refers to MPI ranks for bcast or scatter. For example, ranks 1-1000, the node has 12 cores. So every 12th rank I want to bcast the data. Then on every 12th rank, i want it to scatter data to 12th+1 to 12+12 ranks.
This requires the first bcast to communicate with totalrank/12, then each rank will be responsible for sending data to ranks on the same node, then gathering the results, saving it, then sending more data to ranks on the same node.
Upvotes: 0
Views: 1325
Reputation: 9489
I don't know enough of mpi4py to be able to give you a code sample with it, but here is what could be a solution in C++. I'm sure you can infer a Python code out of it easily.
#include <mpi.h>
#include <iostream>
#include <cstdlib> /// for abs
#include <zlib.h> /// for crc32
using namespace std;
int main( int argc, char *argv[] ) {
MPI_Init( &argc, &argv );
// get size and rank
int rank, size;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
// get the compute node name
char name[MPI_MAX_PROCESSOR_NAME];
int len;
MPI_Get_processor_name( name, &len );
// get an unique positive int from each node names
// using crc32 from zlib (just a possible solution)
uLong crc = crc32( 0L, Z_NULL, 0 );
int color = crc32( crc, ( const unsigned char* )name, len );
color = abs( color );
// split the communicator into processes of the same node
MPI_Comm nodeComm;
MPI_Comm_split( MPI_COMM_WORLD, color, rank, &nodeComm );
// get the rank on the node
int nodeRank;
MPI_Comm_rank( nodeComm, &nodeRank );
// create comms of processes of the same local ranks
MPI_Comm peersComm;
MPI_Comm_split( MPI_COMM_WORLD, nodeRank, rank, &peersComm );
// now, masters are all the processes of nodeRank 0
// they can communicate among them with the peersComm
// and with their local slaves with the nodeComm
int worktoDo = 0;
if ( rank == 0 ) worktoDo = 1000;
cout << "Initially [" << rank << "] on node "
<< name << " has " << worktoDo << endl;
MPI_Bcast( &worktoDo, 1, MPI_INT, 0, peersComm );
cout << "After first Bcast [" << rank << "] on node "
<< name << " has " << worktoDo << endl;
if ( nodeRank == 0 ) worktoDo += rank;
MPI_Bcast( &worktoDo, 1, MPI_INT, 0, nodeComm );
cout << "After second Bcast [" << rank << "] on node "
<< name << " has " << worktoDo << endl;
// cleaning up
MPI_Comm_free( &peersComm );
MPI_Comm_free( &nodeComm );
MPI_Finalize();
return 0;
}
As you can see, you first create communicators with processes on the same node. Then you create peer communicators with all processes of the same local rank on each nodes. From than, your master process of global rank 0 will send data to the local masters. And they will distribute the work on the node they are responsible of.
Upvotes: 4