Serris Filippos
Serris Filippos

Reputation: 39

Combine mpi with openMP

I am trying to solve a problem with MPI, openMP and the combination of those two. Both MPI and openMP are running fine but when I try the MPI-openMP version of the code I can't get pass an error shows up at the end of the post.

MPI version

for(i=0;i<loop;i++)               
{
  for(j=start;j<end+1;j++)        
  {
    for(k=0;k<N;k++)              
    {
      if(j!=k)
      {
        dx=C[k*3+0]-C[j*3+0];
        dy=C[k*3+1]-C[j*3+1];
        dz=C[k*3+2]-C[j*3+2];

        d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));

        dCube=pow(d,3);
        mSquare=pow(m,2);

        F[(j-start)*3+0]-=G*mSquare/dCube*dx;
        F[(j-start)*3+1]-=G*mSquare/dCube*dy;
        F[(j-start)*3+2]-=G*mSquare/dCube*dz;
      }
    }
  }
  for(j=0;j<length;j++)                           
  {
    for(k=0;k<3;k++)                               
    {
      a=F[j*3+k]/m;                               
      V[j*3+k]=V[j*3+k]+a*dt;                     
      Cw[j*3+k]=C[(j+start)*3+k]+V[j*3+k]*dt;     
    }
  }
  MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}

openMP version

#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
  for(i=0;i<loop;i++)                          
  {
    #pragma omp for schedule(static)
      for(j=0;j<N;j++)                         
      {
        for(k=0;k<N;k++)                       
        {
          if(j!=k)
          {
            dx=C[k*3+0]-C[j*3+0];
            dy=C[k*3+1]-C[j*3+1];
            dz=C[k*3+2]-C[j*3+2];

            d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));

            dCube=pow(d,3);
            mSquare=pow(m,2);

            F[j*3+0]-=G*mSquare/dCube*dx;
            F[j*3+1]-=G*mSquare/dCube*dy;
            F[j*3+2]-=G*mSquare/dCube*dz;
          }
        }
      }
    #pragma omp for schedule(static)
      for(j=0;j<N;j++)                        
      {
        for(k=0;k<3;k++)                      
        {
          a=F[j*3+k]/m;                       
          V[j*3+k]=V[j*3+k]+a*dt;             
          C[j*3+k]=C[j*3+k]+V[j*3+k]*dt;      
        }
      }
  }
}

The i loop cannot be parallel because of the need of an updated C in every loop so the parallel part starts from the j

Combined version

#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
  {
    for(i=0;i<loop;i++)                                 
    {
      #pragma omp for schedule(static)
        for(j=start;j<end+1;j++)                        
        {
          for(k=0;k<N;k++)                              
          {
            if(j!=k)
            {
              dx=C[k*3+0]-C[j*3+0];
              dy=C[k*3+1]-C[j*3+1];
              dz=C[k*3+2]-C[j*3+2];

              d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));

              dCube=pow(d,3);
              mSquare=pow(m,2);

              F[(j-start)*3+0]-=G*mSquare/dCube*dx;
              F[(j-start)*3+1]-=G*mSquare/dCube*dy;
              F[(j-start)*3+2]-=G*mSquare/dCube*dz;
            }
          }
        }
      #pragma omp for schedule(static)
        for(j=0;j<length;j++)                           
        {
          for(k=0;k<3;k++)                              
          {
            a=F[j*3+k]/m;                               
            V[j*3+k]=V[j*3+k]+a*dt;                     
            Cw[j*3+k]=C[(j+start)*3+k]+V[j*3+k]*dt;     
          }
        }
      MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
    }
  }

What I've tried to achieve here with MPI is to say in every node with witch part of the C to work on as the MPI approach above and then openMP calculates the Cw tables correspond to this part. Then MPI gather the results with MPI_Allgather in the C table and starts over for loop times. The problem is that when I run the code above this error shows up:

Assertion failed in file src/mpid/ch3/src/ch3u_request.c at line 572: *(&incomplete) >= 0

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10389 RUNNING AT felix-desktop
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

The problem lies int the MPI_Allgather line of code but I can't figure what is the cause of it.

Upvotes: 2

Views: 1932

Answers (1)

dreamcrash
dreamcrash

Reputation: 51393

You are calling the MPI_Allgather inside a parallel region. Consequently, that MPI routine is being called using the same communicator (i.e., MPI_COMM_WORLD) by multiple threads, which is not allowed by the MPI standard. Nevertheless, you can ensure that the MPI_Allgather routine is invoked by a single thread:

#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
  ... 
  #pragma omp master 
  MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
  #pragma omp barrier 
}

By adding the OpenMP clause master one ensures that only one thread calls the MPI_Allgather (i.e., the master thread). To be sure that you do not have race conditions (i.e., threads changing the content that will be used in the MPI_Allgatherv MPI call) an OpenMP barrier was added right after the MPI_Allgather. Hence, all threads from all processes will wait until the master thread of their processes finishes performing the MPI_Allgather call.

If the MPI version that you are using has a thread support level of MPI_THREAD_SERIALIZED, then you can use the constructor single instead of master. Ensure that the level of thread support of your MPI implementation is at least MPI_THREAD_FUNNELED.

Another approach is to move the MPI_Allgather routine to the outside of the parallel region:

for(i=0;i<loop;i++)                                 
{
  #pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a){
  ...
  }
  MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}     

The multiple parallel regions creation overhead should be attenuated by the fact that a decent OpenMP implementation will use a thread pool, hence initializing threads only once.

Upvotes: 4

Related Questions