Siva Srinivas Kolukula
Siva Srinivas Kolukula

Reputation: 1251

MPI in Fortran gives garbage values

PROGRAM ShareNeighbors
IMPLICIT REAL (a-h,o-z)
INCLUDE "mpif.h"
PARAMETER (m = 500, n = 500)
DIMENSION a(m,n), b(m,n)
DIMENSION h(m,n)
INTEGER istatus(MPI_STATUS_SIZE)
INTEGER iprocs, jprocs 
PARAMETER (ROOT = 0) 
integer dims(2),coords(2)
logical   periods(2)
data periods/2*.false./
integer status(MPI_STATUS_SIZE)
integer comm2d,req,source

CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
! Get a new communicator for a decomposition of the domain.  
! Let MPI find a "good" decomposition
dims(1) = 0
dims(2) = 0
CALL MPI_DIMS_CREATE(nprocs,2,dims,ierr)
if (myrank.EQ.Root) then
   print *,nprocs,'processors have been arranged into',dims(1),'X',dims(2),'grid'
endif
CALL MPI_CART_CREATE(MPI_COMM_WORLD,2,dims,periods,.true., &
                  comm2d,ierr)
!   Get my position in this communicator
CALL MPI_COMM_RANK(comm2d,myrank,ierr)
! Get the decomposition
CALL fnd2ddecomp(comm2d,m,n,ista,iend,jsta,jend)
! print *,ista,jsta,iend,jend
ilen = iend - ista + 1
jlen = jend - jsta + 1

CALL MPI_Cart_get(comm2d,2,dims,periods,coords,ierr)
iprocs = dims(1)
jprocs = dims(2)
myranki = coords(1)
myrankj = coords(2)

DO j = jsta, jend
    DO i = ista, iend
    a(i,j) = myrank+1
    ENDDO
ENDDO
! Send data from each processor to Root
call MPI_ISEND(ista,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(iend,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jsta,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jend,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
                  Root,1,MPI_COMM_WORLD,req,ierr )
!    Recieved the results from othe precessors   
if (myrank.EQ.Root) then
    do source = 0,nprocs-1
       call MPI_RECV(ista,1,MPI_INTEGER,source,   &
                     1,MPI_COMM_WORLD,status,ierr )
       call MPI_RECV(iend,1,MPI_INTEGER,source,   &
                     1,MPI_COMM_WORLD,status,ierr )
       call MPI_RECV(jsta,1,MPI_INTEGER,source,   &
                     1,MPI_COMM_WORLD,status,ierr )
       call MPI_RECV(jend,1,MPI_INTEGER,source,   &
                    1,MPI_COMM_WORLD,status,ierr )      
        ilen = iend - ista + 1
        jlen = jend - jsta + 1                          
       call MPI_RECV(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL,   &
                    source,1,MPI_COMM_WORLD,status,ierr)
! print the results
       call ZMINMAX(m,n,ista,iend,jsta,jend,a(:,:),amin,amax)
       print *, 'myid=',source,amin,amax
        call MPI_Wait(req, status, ierr) 
   enddo    
endif

CALL MPI_FINALIZE(ierr)
END

subroutine fnd2ddecomp(comm2d,m,n,ista,iend,jsta,jend)
integer   comm2d
integer   m,n,ista,jsta,iend,jend
integer   dims(2),coords(2),ierr
logical   periods(2)
! Get (i,j) position of a processor from Cartesian topology.
CALL MPI_Cart_get(comm2d,2,dims,periods,coords,ierr)
! Decomposition in first (ie. X) direction
CALL MPE_DECOMP1D(m,dims(1),coords(1),ista,iend)
! Decomposition in second (ie. Y) direction
CALL MPE_DECOMP1D(n,dims(2),coords(2),jsta,jend)

return
end
SUBROUTINE MPE_DECOMP1D(n,numprocs,myid,s,e)
integer n,numprocs,myid,s,e,nlocal,deficit
nlocal  = n / numprocs
s       = myid * nlocal + 1
deficit = mod(n,numprocs)
s       = s + min(myid,deficit)
! Give one more slice to processors
if (myid .lt. deficit) then
    nlocal = nlocal + 1
endif
e = s + nlocal - 1
if (e .gt. n .or. myid .eq. numprocs-1) e = n

return
end
SUBROUTINE ZMINMAX(IX,JX,SX,EX,SY,EY,ZX,ZXMIN,ZXMAX)

INTEGER :: IX,JX,SX,EX,SY,EY
REAL :: ZX(IX,JX)
REAL :: ZXMIN,ZXMAX

ZXMIN=1000.
ZXMAX=-1000.
DO II=SX,EX
   DO JJ=SY,EY  
      IF(ZX(II,JJ).LT.ZXMIN)ZXMIN=ZX(II,JJ)
      IF(ZX(II,JJ).GT.ZXMAX)ZXMAX=ZX(II,JJ)
   ENDDO
ENDDO   

RETURN
END

When I am running the above code with 4 processors Root receives garbage values. Where as for 15 processors, the data transfer is proper. How I can tackle this? I guess it is related buffer, a point which is not clear to me. How I have to tackle the buffer wisely?

Upvotes: 0

Views: 372

Answers (1)

1. problem

You are doing multiple sends

call MPI_ISEND(ista,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(iend,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jsta,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(jend,1,MPI_INTEGER,Root,1,  &
                  MPI_COMM_WORLD,req,ierr)
call MPI_ISEND(a(ista:iend,jsta:jend),(ilen)*(jlen),MPI_REAL, &
                  Root,1,MPI_COMM_WORLD,req,ierr )

and all of them with the same request variable req. That can't work.

2. problem

You are using a subarray a(ista:iend,jsta:jend) in non-blocking MPI. That is not allowed*. You need to copy the array into some temporary array buffer or use MPI derived subarray datatype (too hard for you at this stage).

The reason for the problem is that the compiler will create a temporary copy just for the call to ISend. The ISend will remember the address, but will not send anything. Then temporary is deleted and the address becomes invalid. And then the MPI_Wait will try to use that address and will fail.

3. problem

Your MPI_Wait is in the wrong place. It must be after the sends out of any if conditions so that they are always executed (provided you are always sending).

You must collect all request separately and than wait for all of them. Best to have them in a an array and wait for all of them at once using MPI_Waitall.

Remeber, the ISend typically does not actually send anything if the buffer is large. The exchange often happens during the Wait operation. At least for larger arrays.


Recommendation:

Take a simple problem example and try to exchange just two small arrays with MPI_IRecv and MPI_ISend between two processes. As simple test problem as you can do. Learn from it, do simple steps. Take no offence, but your current understanding of non-blocking MPI is too weak to write full scale programs. MPI is hard, non-blocking MPI is even harder.


* not allowed when using the interface available in MPI-2. MPI-3 brings a new interface available by using use mpi_f08 where it is possible. But learn the basics first.

Upvotes: 2

Related Questions