Reputation: 491
The issue I am trying to resolve is the following:
The C++ serial code I have computes across a large 2D matrix. To optimize this process, I wish to split this large 2D matrix and run on 4 nodes (say) using MPI. The only communication that occurs between nodes is the sharing of edge values at the end of each time step. Every node shares the edge array data, A[i][j], with its neighbor.
Based on reading about MPI, I have the following scheme to be implemented.
if (myrank == 0)
{
for (i= 0 to x)
for (y= 0 to y)
{
C++ CODE IMPLEMENTATION
....
MPI_SEND(A[x][0], A[x][1], A[x][2], Destination= 1.....)
MPI_RECEIVE(B[0][0], B[0][1]......Sender = 1.....)
MPI_BARRIER
}
if (myrank == 1)
{
for (i = x+1 to xx)
for (y = 0 to y)
{
C++ CODE IMPLEMENTATION
....
MPI_SEND(B[x][0], B[x][1], B[x][2], Destination= 0.....)
MPI_RECEIVE(A[0][0], A[0][1]......Sender = 1.....)
MPI BARRIER
}
I wanted to know if my approach is correct and also would appreciate any guidance on other MPI functions too look into for implementation.
Thanks, Ashwin.
Upvotes: 24
Views: 54093
Reputation: 56
This question has already been answered quite thoroughly by Jonathan Dursi; however, as Jonathan Leffler has pointed out in his comment to Jonathan Dursi's answer, C's multi-dimensional arrays are a contiguous block of memory. Therefore, I would like to point out that for a not-too-large 2d array, a 2d array could simply be created on the stack:
int A[N][M];
Since, the memory is contiguous, the array can be sent as it is:
MPI_Send(A, N*M, MPI_INT,1, tagA, MPI_COMM_WORLD);
On the receiving side, the array can be received into a 1d array of size N*M (which can then be copied into a 2d array if necessary):
int A_1d[N*M];
MPI_Recv(A_1d, N*M, MPI_INT,0,tagA, MPI_COMM_WORLD,&status);
//copying the array to a 2d-array
int A_2d[N][M];
for (int i = 0; i < N; i++){
for (int j = 0; j < M; j++){
A_2d[i][j] = A_1d[(i*M)+j]
}
}
Copying the array does cause twice the memory to be used, so it would be better to simply use A_1d by accessing its elements through A_1d[(i*M)+j].
Upvotes: 0
Reputation: 6357
First you don't need that much barrier Second, you should really send your data as a single block as multiple send/receive blocking their way will result in poor performances.
Upvotes: 4
Reputation: 50947
Just to amplify Joel's points a bit:
This goes much easier if you allocate your arrays so that they're contiguous (something C's "multidimensional arrays" don't give you automatically:)
int **alloc_2d_int(int rows, int cols) {
int *data = (int *)malloc(rows*cols*sizeof(int));
int **array= (int **)malloc(rows*sizeof(int*));
for (int i=0; i<rows; i++)
array[i] = &(data[cols*i]);
return array;
}
/*...*/
int **A;
/*...*/
A = alloc_2d_init(N,M);
Then, you can do sends and recieves of the entire NxM array with
MPI_Send(&(A[0][0]), N*M, MPI_INT, destination, tag, MPI_COMM_WORLD);
and when you're done, free the memory with
free(A[0]);
free(A);
Also, MPI_Recv
is a blocking recieve, and MPI_Send
can be a blocking send. One thing that means, as per Joel's point, is that you definately don't need Barriers. Further, it means that if you have a send/recieve pattern as above, you can get yourself into a deadlock situation -- everyone is sending, no one is recieving. Safer is:
if (myrank == 0) {
MPI_Send(&(A[0][0]), N*M, MPI_INT, 1, tagA, MPI_COMM_WORLD);
MPI_Recv(&(B[0][0]), N*M, MPI_INT, 1, tagB, MPI_COMM_WORLD, &status);
} else if (myrank == 1) {
MPI_Recv(&(A[0][0]), N*M, MPI_INT, 0, tagA, MPI_COMM_WORLD, &status);
MPI_Send(&(B[0][0]), N*M, MPI_INT, 0, tagB, MPI_COMM_WORLD);
}
Another, more general, approach is to use MPI_Sendrecv
:
int *sendptr, *recvptr;
int neigh = MPI_PROC_NULL;
if (myrank == 0) {
sendptr = &(A[0][0]);
recvptr = &(B[0][0]);
neigh = 1;
} else {
sendptr = &(B[0][0]);
recvptr = &(A[0][0]);
neigh = 0;
}
MPI_Sendrecv(sendptr, N*M, MPI_INT, neigh, tagA, recvptr, N*M, MPI_INT, neigh, tagB, MPI_COMM_WORLD, &status);
or nonblocking sends and/or recieves.
Upvotes: 44