ftiaronsem
ftiaronsem

Reputation: 1584

MPI communication stalls when node is only partially reserved

This is a tricky one. I will try to describe it as accurately as I can. I inherited a fortran program consisting of a couple of thousand lines of fortran code (in one subroutine) that uses mpi to parallelize computations. Fortunately it only uses very few mpi commands, here they are

call mpi_gather(workdr(ipntr(2)),int(icount/total),mpi_double_precision,&
        & workdr(ipntr(2)),int(icount/total),mpi_double_precision,0,mpi_comm_world,mpierr)

call mpi_gather(workdi(ipntr(2)),int(icount/total),mpi_double_precision,&
        & workdi(ipntr(2)),int(icount/total),mpi_double_precision,0,mpi_comm_world,mpierr)

a couple of dozen lines later this is followed by

call mpi_bcast(workdr(istart),j,mpi_double_precision,total-1,&
                       & mpi_comm_world,mpiierr)
call mpi_bcast(workdi(istart),j,mpi_double_precision,total-1,&
                       & mpi_comm_world,mpiierr)                                                                                                   

call mpi_bcast(workdr(ipntr(2)),icount,mpi_double_precision,0,mpi_comm_world,mpiierr)
call mpi_bcast(workdi(ipntr(2)),icount,mpi_double_precision,0,mpi_comm_world,mpiierr)

both routines and the surrounding code are included in an if statement, that is evaluated on each rank

 call znaupd ( ido, bmat, n, which, nev, tol, resid, ncv,&
 &                 v, ldv, iparam, ipntr, workd, workl, lworkl,&
 &                 rworkl,info )
 if (ido .eq. -1 .or. ido .eq. 1) then
 [...code here...]
 [...mpi code here...]
 [...couple of dozen lines...]
 [...mpi code here...]
 [...code here...]
 end if

This code compiles and compiles and produces reasonable results (it's a physics simulation)

To be more precise, it is the rank=0 node that is stalling. The above code is run in a loop until the if statement evaluates to false. The output shows that the code is running for several times, but then it happens. Several of the processes are evaluating the if statement as false and exit the loop. But the rank 0 node evaluates it as true and stalls when it calls the mpi_gather.

So you are probably thinking that this cannot be answered without seeing the full code, that there must be something that causes the if statement to evaluated incorrectly on the rank 0 node.

Consider however that it runs fine with an arbitrary number of processors on a single node and an arbitrary number of nodes as long as one reserves all processors on the node.

My thoughts on this so far and my questions:

  1. Are the above mpi calls blocking? My understanding is that the above mpi commands are buffered, so that while some of the processes might continue executing, their message is saved in the buffer of the receiver. So messages can't be lost. Is that correct?

  2. Has anybody else ever experienced an issue similar to this? Stalling if not all processors are reserved? I have to admit I am sort of lost on this. I don't really know where to start debugging. Any hints and pointers are greatly appreciated.

This issue has been reproduced on different clusters with different compilers and different mpi implementations. It really seems to be an issue in the code.

Thanks so much for your help. Any ideas are greatly appreciated.

EDIT: Here are further details of our system:

MPI Environenment: mvapich2 v1.6
Compiler: intel ifort 13.2
The arpack library used is the standard arpack, not p_arpack. The code takes care of the parallelization (optimizing memory usage)

Upvotes: 1

Views: 463

Answers (1)

ftiaronsem
ftiaronsem

Reputation: 1584

Turns out the problem was not in the code. It was the mpi implemntation! Initially I hadn't thought of this, since I was running the program on two different clusters, using different mpi implementations (mvapich and intel mpi). However it turns out that both were derived from the same 8 year old mpich implementation. After upgrading to a more recent version of mvapich which is derived from a more recent version of mpich, the odd behavior stopped and the code runs as expected.

Thanks again to everybody who provided comments.

Upvotes: 1

Related Questions