Reputation: 39
I am trying to solve a problem with MPI, openMP and the combination of those two. Both MPI and openMP are running fine but when I try the MPI-openMP version of the code I can't get pass an error shows up at the end of the post.
for(i=0;i<loop;i++)
{
for(j=start;j<end+1;j++)
{
for(k=0;k<N;k++)
{
if(j!=k)
{
dx=C[k*3+0]-C[j*3+0];
dy=C[k*3+1]-C[j*3+1];
dz=C[k*3+2]-C[j*3+2];
d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));
dCube=pow(d,3);
mSquare=pow(m,2);
F[(j-start)*3+0]-=G*mSquare/dCube*dx;
F[(j-start)*3+1]-=G*mSquare/dCube*dy;
F[(j-start)*3+2]-=G*mSquare/dCube*dz;
}
}
}
for(j=0;j<length;j++)
{
for(k=0;k<3;k++)
{
a=F[j*3+k]/m;
V[j*3+k]=V[j*3+k]+a*dt;
Cw[j*3+k]=C[(j+start)*3+k]+V[j*3+k]*dt;
}
}
MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}
loop
: describes the amount the the process will take placeC
table needed hole for the right results the start
& end
variables tell every node witch part of the table are responsible forMPI_allgather
at the end updates the C table with the new values stored in Cw tables before the new i
loop startscounts
variable is the size of data that MPI_Allgather
will transfer and the displs
is in witch position in C
every node must put the Cw
data#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
for(i=0;i<loop;i++)
{
#pragma omp for schedule(static)
for(j=0;j<N;j++)
{
for(k=0;k<N;k++)
{
if(j!=k)
{
dx=C[k*3+0]-C[j*3+0];
dy=C[k*3+1]-C[j*3+1];
dz=C[k*3+2]-C[j*3+2];
d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));
dCube=pow(d,3);
mSquare=pow(m,2);
F[j*3+0]-=G*mSquare/dCube*dx;
F[j*3+1]-=G*mSquare/dCube*dy;
F[j*3+2]-=G*mSquare/dCube*dz;
}
}
}
#pragma omp for schedule(static)
for(j=0;j<N;j++)
{
for(k=0;k<3;k++)
{
a=F[j*3+k]/m;
V[j*3+k]=V[j*3+k]+a*dt;
C[j*3+k]=C[j*3+k]+V[j*3+k]*dt;
}
}
}
}
The i
loop cannot be parallel because of the need of an updated C
in every loop so the parallel part starts from the j
#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
for(i=0;i<loop;i++)
{
#pragma omp for schedule(static)
for(j=start;j<end+1;j++)
{
for(k=0;k<N;k++)
{
if(j!=k)
{
dx=C[k*3+0]-C[j*3+0];
dy=C[k*3+1]-C[j*3+1];
dz=C[k*3+2]-C[j*3+2];
d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));
dCube=pow(d,3);
mSquare=pow(m,2);
F[(j-start)*3+0]-=G*mSquare/dCube*dx;
F[(j-start)*3+1]-=G*mSquare/dCube*dy;
F[(j-start)*3+2]-=G*mSquare/dCube*dz;
}
}
}
#pragma omp for schedule(static)
for(j=0;j<length;j++)
{
for(k=0;k<3;k++)
{
a=F[j*3+k]/m;
V[j*3+k]=V[j*3+k]+a*dt;
Cw[j*3+k]=C[(j+start)*3+k]+V[j*3+k]*dt;
}
}
MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}
}
What I've tried to achieve here with MPI
is to say in every node with witch part of the C
to work on as the MPI approach above and then openMP calculates the Cw
tables correspond to this part. Then MPI gather the results with MPI_Allgather
in the C
table and starts over for loop
times. The problem is that when I run the code above this error shows up:
Assertion failed in file src/mpid/ch3/src/ch3u_request.c at line 572: *(&incomplete) >= 0
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10389 RUNNING AT felix-desktop
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGESYOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
The problem lies int the MPI_Allgather
line of code but I can't figure what is the cause of it.
Upvotes: 2
Views: 1932
Reputation: 51393
You are calling the MPI_Allgather
inside a parallel region. Consequently, that MPI
routine is being called using the same communicator (i.e., MPI_COMM_WORLD
) by multiple threads, which is not allowed by the MPI
standard. Nevertheless, you can ensure that the MPI_Allgather
routine is invoked by a single thread:
#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
...
#pragma omp master
MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
#pragma omp barrier
}
By adding the OpenMP
clause master
one ensures that only one thread calls the MPI_Allgather
(i.e., the master
thread). To be sure that you do not have race conditions (i.e., threads changing the content that will be used in the MPI_Allgatherv
MPI
call) an OpenMP barrier
was added right after the MPI_Allgather
. Hence, all threads from all processes will wait until the master
thread of their processes finishes performing the MPI_Allgather
call.
If the MPI version that you are using has a thread support level of MPI_THREAD_SERIALIZED, then you can use the constructor single
instead of master
. Ensure that the level of thread support of your MPI implementation is at least MPI_THREAD_FUNNELED
.
Another approach is to move the MPI_Allgather
routine to the outside of the parallel region:
for(i=0;i<loop;i++)
{
#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a){
...
}
MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}
The multiple parallel regions creation overhead should be attenuated by the fact that a decent OpenMP implementation will use a thread pool, hence initializing threads only once.
Upvotes: 4