Reputation: 2571
I wish to discover the cause of an error in an MPI program. The program is a big while loop such that for each iteration, a set of message passing is done between each processor and its neighbors using ISEND and IRECV as follows:
while ( t< a very large number ) ...
do i=1,8
if ( something that is almost always true ) then
call MPI_ISEND(A,A_buffer,inewtype,neighrank(i),2,MPI_COMM_WORLD,isend,ierr)
call MPI_WAIT(isend,istatus,ierr)
call MPI_ISEND(B,B_buffer,MPI_INTEGER4,neighrank(i),3,MPI_COMM_WORLD,isend,ierr)
call MPI_WAIT(isend,istatus,ierr)
end if
end do
do i=1,8
if ( something that is almost always true) then
call MPI_IRECV(C,C_buffer,inewtype,neighrank(i),2,MPI_COMM_WORLD,irecv,ierr)
call MPI_WAIT(irecv,istatus,ierr)
call MPI_IRECV(D,D_buffer,MPI_INTEGER4,neighrank(i),3,MPI_COMM_WORLD,irecv,ierr)
call MPI_WAIT(irecv,istatus,ierr)
end if
end do
...
The program produces a segmentation fault
error after a very large number of iterations. At each iteration, the same amount of data are message passed among the processors, but the number of calls to ISEND and IRECV is adjustable (i.e. use 80 calls to pass 80kb total or 40 calls to pass 160kb total). If the number of calls is small the program crashes earlier.
I am suspecting that something about InfiniBand! is causing this error, but I do not get an insufficient virtual memory
- so it cannot possibly be InfiniBand? What can possibly cause this error?
Upvotes: 0
Views: 411
Reputation: 2571
The MPI code turned out to be fine. It was hard to tell because the program takes 1-2 hours to run before running into Segmentation Fault
. Rigorous debugging point out to a non-MPI related bug.
Upvotes: 1