Reputation: 1
Need help with this Fortran MPI problem. Trying to gather data from different columns of 2D array. The problem is that all data from each row is not used and columns assigned per process my not be equal. All processes start with a equivalent global view of data, each process should perform work on specific columns, and finally exchange information so that all process share the common view again. Problems is similar to MPI partition and gather 2D array in Fortran and Sending 2D arrays in Fortran with MPI_Gather
Drawn example: data(8,4) using 3 MPI process
---------------------
| a1 | b1 | c1 | d1 |
| a2 | b2 | c2 | d2 |
| a3 | b3 | c3 | d3 |
| a4 | b4 | c4 | d4 |
| a5 | b5 | c5 | d5 |
| a6 | b6 | c6 | d6 |
| a7 | b7 | c7 | d7 |
| a8 | b8 | c8 | d8 |
---------------------
Process 1 will get 2 column to work, process 2 gets 1 column, process 3 gets 1 column.
----------- ------ ------
| a1 | b1 | | c1 | | d1 |
| a2 | b2 | | c2 | | d2 |
| a3 | b3 | | c3 | | d3 |
| a4 | b4 | | c4 | | d4 |
| a5 | b5 | | c5 | | d5 |
| a6 | b6 | | c6 | | d6 |
| a7 | b7 | | c7 | | d7 |
| a8 | b8 | | c8 | | d8 |
----------- ------ ------
In real problem, the actual size is data(200000,59). This is a preallocated chunk of memory where I am only used part of each column (always starting at index 1). For instance, I only need the first 3 values in each column.
----------- ------ ------
| a1 | b1 | | c1 | | d1 |
| a2 | b2 | | c2 | | d2 |
| a3 | b3 | | c3 | | d3 |
| == | == | | == | | == |
| a4 | b4 | | c4 | | d4 |
| a5 | b5 | | c5 | | d5 |
| a6 | b6 | | c6 | | d6 |
| a7 | b7 | | c7 | | d7 |
| a8 | b8 | | c8 | | d8 |
----------- ------ ------
I am trying to create a send and receive data type that can be used to accomplish this. My best guess so far has been to use MPI_TYPE_VECTOR. MPI_TYPE_VECTOR(COUNT, BLOCKLENGTH, STRIDE, OLDTYPE, NEWTYPE, IERROR)
For this would use MPI_TYPE_VECTOR(1, 3, 8, MPI_DOUBLE, newtype, ierr). This should allow each process to send minimal amount of information. With this, I thought I should be able to send information with ALLGATHERV.
MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNT, DISPLS, RECVTYPE, COMM, IERROR) Where I use MPI_ALLGATHERV(data(1,my_first_col), num_cols_to_be_sent, newtype, data, RECVCOUNT[], DISPLS[], newtype, COMM, IERROR)
From what I can tell, this is the information that should be sent being sent for each process.
Process 1: [a1,a2,a3,b1,b2,b3]
Process 2: [c1,c2,c3]
Process 3: [d1,d2,d3]
The examples I have seen all use either the entire column of data or the displacement is naturally a multiple of the subarray needed. I cannot get it to unpack into the correct columns. Shouldn't it be able to do this since the receive end has an understanding of the type's size/extent. Granted i am very confused on the whole extent thing. Any help would be appreciated. The real code is at work, but here is a quick recreation for viewing and comments (may not compile,just made quickly).
MODULE PARALLEL
INTEGER iproc, nproc, rank, ierr
INTEGER mylow, myhigh, mysize, ichunk, irem
INTEGER, ALLOCATABLE :: isize(:), idisp(:), ilow(:), ihigh(:)
DOUBLE PRECISION, ALLOCATABLE :: glob_val(:,:)
INTEGER newtype
END MODULE
PROGRAM MAIN
USE PARALLEL
IMPLICIT NONE
INCLUDE 'mpif.f'
c **temp variables
integer i, j
integer num_rows,num_cols
integer used_rows
c ----setup MPI----
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
iproc = rank+1 !rank is base 0, rest of fortran base 1
c ----setup initial data
integer num_rows=20 !contiguous in memory over rows (ie single column)
integer num_cols=11 !noncontiguous memory between different columns
integer
ALLOCATE (isize(nproc))
ALLOCATE (idisp(nproc))
ALLOCATE (ilow(nproc))
ALLOCATE (ishigh(nproc))
ALLOCATE (glob_val(num_rows,num_cols))
glob_val = 1.0*iproc !sent all glob values to process id
do i=1,num_cols
do j=1,used_rows
glob_val(j,i) = iproc+.01*j !add refernce index to used data
end do
end do
c ---setup exchange information
ichunk = num_cols/nproc
irem = num_cols -(ichunk*nproc)
mysize=ichunk
if(iproc.le.irem) mysize=mysize+1
mylow=0
myhigh=0
do i=1,nproc !establish global understanding of processes
mylow=myhigh+1
myhigh=mylow+ichunk
if(i.le.irem) myhigh=myhigh+1
isize(i)=myhigh-mylow+1
idisp(i)=(mylow-1) !based on receiving type size/extent
ilow(i)=mylow
ihigh(i)=myhigh
end do
mylow=ilow(iproc)
myhigh=ihigh(iproc)
call MPI_TYPE_VECTOR(1,used_rows,num_rows,MPI_DOUBLE,
& newtype,ierr)
call MPI_TYPE_COMMIT(newtype,ierr)
c --- perform exchange based on 'newtype'
!MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE,
! RECVBUF, RECVCOUNT, DISPLS, RECVTYPE,
! COMM, IERROR)
call MPI_ALLGATHERV(glob_val(1,mylow),mysize,newtype
& glob_val,isize,iproc,newtype,
& MPI_COMM_WORLD,ierr)
c ---print out global results of process 2
if(iproc.eq.2) then
do i=1,num_rows
write(*,*) (glob_val(i,j),j=1,num_cols)
end do
end if
END program
Upvotes: 0
Views: 997
Reputation: 995
OK, I got this working in the following way:
1) myhigh=mylow + ichunk - 1
not myhigh = mylow + ichunk
2) used_rows has to be set before the assignment loop
3) Define the actual buffers more explicitly, try
call MPI_ALLGATHERV(glob_val(:,mylow:myhigh), mysize, newtype, &
glob_val(1:used_rows,:), isize, idisp, newtype, &
MPI_COMM_WORLD, ierr)
full code using gfortran and openmpi:
MODULE PARALLEL
INTEGER iproc, nproc, rank, ierr
INTEGER mylow, myhigh, mysize, ichunk, irem
INTEGER, ALLOCATABLE :: isize(:), idisp(:), ilow(:), ihigh(:)
DOUBLE PRECISION, ALLOCATABLE :: glob_val(:,:)
INTEGER newtype
END MODULE
PROGRAM MAIN
USE PARALLEL
use mpi
IMPLICIT NONE
! INCLUDE 'mpif.f'
! **temp variables
integer i, j
integer num_rows,num_cols
integer used_rows
! ----setup MPI----
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
iproc = rank+1 !rank is base 0, rest of fortran base 1
! ----setup initial data
num_rows=8 !contiguous in memory over rows (ie single column)
num_cols=4 !noncontiguous memory between different columns
used_rows = 3
ALLOCATE (isize(nproc))
ALLOCATE (idisp(nproc))
ALLOCATE (ilow(nproc))
ALLOCATE (ihigh(nproc))
ALLOCATE (glob_val(num_rows,num_cols))
! glob_val = 1.0*iproc !sent all glob values to process id
glob_val = -1.0 * iproc
do i=1,num_cols
do j=1,used_rows
glob_val(j,i) = (1.0*iproc)+(.01*j) !add refernce index to used data
end do
end do
! ---setup exchange information
ichunk = num_cols/nproc
irem = num_cols -(ichunk*nproc)
mysize=ichunk
if(iproc.le.irem) mysize=mysize+1
mylow=0
myhigh=0
do i=1,nproc !establish global understanding of processes
mylow=myhigh+1
myhigh=mylow+ichunk-1
if(i.le.irem) myhigh=myhigh+1
isize(i)=myhigh-mylow+1
idisp(i)=(mylow-1) !based on receiving type size/extent
ilow(i)=mylow
ihigh(i)=myhigh
end do
mylow=ilow(iproc)
myhigh=ihigh(iproc)
call MPI_TYPE_VECTOR(1,used_rows,num_rows,MPI_DOUBLE, &
newtype,ierr)
call MPI_TYPE_COMMIT(newtype,ierr)
write(*,*) rank, idisp
write(*,*) rank, isize
! --- perform exchange based on 'newtype'
!MPI_ALLGATHERV(SENDBUF, SENDCOUNT, SENDTYPE,
! RECVBUF, RECVCOUNT, DISPLS, RECVTYPE,
! COMM, IERROR)
call MPI_ALLGATHERV(glob_val(:,mylow:myhigh),mysize,newtype, &
glob_val(1:used_rows,:),isize,idisp,newtype, &
MPI_COMM_WORLD,ierr)
! ---print out global results of process 2
if(iproc.eq.2) then
do i=1,num_rows
write(*,*) (glob_val(i,j),j=1,num_cols)
end do
end if
call MPI_Finalize(ierr)
END program
Upvotes: 0