arunmoezhi
arunmoezhi

Reputation: 3200

partition a 2D array row-wise and use allgather?

I have a loop that look like this:

do j=1,100
    do i=1,1000 
      combined_array(i,j)=combined_array(i,j-1)
      call foo(combined_array(i,j))
    enddo
enddo

subroutine foo(x)
    x= somefunction(x)
end subroutine foo

I want to split the computation but there is a dependancy in the columns. If it was not there I could have split the task on the columns and used allgather ( partition a 2D array column-wise and use allgather ) For this loop I can split the tasks on the rows but how do I combine the results using allgather? The chunk each rank gets is not contiguous in memory

Upvotes: 1

Views: 992

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74455

MPI provides the strided vector type for such kind of problems. You would construct such type using a stride that equals to the total column height and a block size equal to the number of rows in the subblock. You would also give this type a count that equals the number of columns in a row. Here is a concrete example: let's say you have a matrix of REAL with nr rows and nc columns and each process holds a subblock with nr1 rows. Then you would do:

integer :: rowtype

call MPI_TYPE_VECTOR(nc, nr1, nr, MPI_REAL, rowtype, ierr)
call MPI_TYPE_COMMIT(rowtype, ierr)

Let's say you would like to receive some data into a subblock that starts at row myrow. Using this new type you can simply do:

call MPI_RECV(array(myrow,1), 1, rowtype, src, tag, &
              MPI_COMM_WORLD, status, ierr)

It works like that - starting at the provided address of the top lef element of the subblock (that is array(myrow,1)) it will put nr1 elements from the received message, then will skip nr - nr1 elements of array, then put nr1 more elements and again skip nr - nr1 elements and so on, nc times.

But there is a problem here. The extent of the rowtype type would be nc*nr elements. You cannot use it like that with MPI_(ALL)GATHERV() since you would be only able to position the beginning of each piece at offsets that are multiple of the type extent, i.e. multiples of nc*nr. To overcome this limitation, MPI allows you to artificially change the extent of the type using MPI_TYPE_CREATE_RESIZED. It takes a type and builds a new one that has the same type map (e.g. will lay out elements in memory using the same "recipe" as the old type), but when computing offsets and other things that depend on the extent of the type, MPI would use the user provided value instead of the real one. What you need to do is to alter the extent of myrow to be equal to the extent of nr1 elements of type MPI_REAL. It is done like this:

integer(kind = MPI_ADDRESS_KIND) :: lb, extent
integer :: rowtype_resized

call MPI_TYPE_GET_EXTENT(MPI_REAL, lb, extent, ierr)
extent = extent * nr1
call MPI_TYPE_CREATE_RESIZED(rowtype, 0, extent, rowtype_resized, ierr)
call MPI_TYPE_COMMIT(rowtype_resized, ierr)

Now you can use rowtype_resized to receive subblocks of nr1 rows and nc columns but you can position them so to start at any row of the big array that is multiple or nr1 and not only on multiples of the total size of array. You can then proceed like this:

call MPI_ALLGATHER(smallarray, nr1*nc, MPI_REAL, &
                   array, 1, rowtype_resized, MPI_COMM_WORLD, ierr)

This will gather the content of the small arrays smallarray (each of nr1 rows and nc columns) into the big arrays array (each of nr1 * #processes rows and nc columns).

You can even be more flexible and register the vector type with a block length of 1 instead of nr1. This will allow you to send a single row. Then you can create a resized type with an extent of one MPI_REAL element and use MPI_(ALL)GATHERV to gather differently sized subblocks.

I know it's a bit tricky, but in time one learns how to master the type system of MPI.

Upvotes: 2

Related Questions