ATK
ATK

Reputation: 1526

Strange occurrence with a send/recv MPI pair

I have an application where the root rank is sending messages to all ranks in the following way:

tag = 22
if( myrankid == 0 )then 
  do i = 1, nproc 
    if(I==1)then 
        do j = 1, nvert
           xyz((j-1)*3+1) = data((j-1)*3+1,1)       
           xyz((j-1)*3+2) = data((j-1)*3+2,1)
           xyz((j-1)*3+3) = data((j-1)*3+3,1)
        enddo 
     else
        call mpi_send(data, glb_nvert(i)*3, mpi_real, i-1, tag, comm, ierr)
     endif
   enddo
 else
   
   call mpi_recv(data, glb_nvert(i)*3, mpi_real, 0, tag,comm, stat,ierr)

 endif

My problem is that at only when running above 3000 ranks this pair hangs at a certain mpi rank (on my specific app it is rank 2009)

Now, I do check that the sizes and arrays are consistent and the only thing I found interesting was the comm. The comm is a communicator which I have duplicated from another MPI communicator.

When I print comm like print*, comm all ranks except the root prints the same integer, except for the root.

E.g.

The root prints:

-1006632941

while rhe remaining 2999 ranks prints:

-1006632951

Is that really what causing the problem?

I have tried using intel mpi and the cray mpi.

Upvotes: 1

Views: 103

Answers (0)

Related Questions