Reputation: 77
Suppose that k processes compute the elements of a matrix A, whose dimension is (n,m), where n is the number of rows and m is the number of columns. I am trying to use MPI_GATHER
to gather these two matrices to the matrix B at the root process, where the dimension of B is (n,km). To be more specific, I wrote an example fortran code below. Here, I am passing over the columns of the matrix A (not the entire matrix) to the matrix B but this wouldn't work. When I run the executable using mpirun -n 2 a.out
, I get the error:
malloc: *** error for object 0x7ffa89413fb8: incorrect checksum for freed object - object was probably modified after being freed.
1) Why do I get this error message?
2) Who can please explain conceptually, why I have to use MPI_TYPE_VECTOR
?
3) How should I correct the MPI_GATHER
part of the code? Can I pass over the entire matrix A?
PROGRAM test
IMPLICIT NONE
INCLUDE "mpif.h"
INTEGER, PARAMETER :: n=100, m=100
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: A
INTEGER, DIMENSION(n,m) :: B
INTEGER :: ind_a, ind_c
INTEGER :: NUM_PROC, PROC_ID, IERROR, MASTER_ID=0
INTEGER :: c
INTEGER, DIMENSION(m) :: cvec
CALL MPI_INIT(IERROR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, PROC_ID, IERROR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NUM_PROC, IERROR)
ALLOCATE(A(n,m/NUM_PROC))
DO ind_c=1,m
cvec(ind_c)=ind_c
END DO
! Fill in matrix A
DO ind_a=1,n
DO ind_c=1,m/NUM_PROC
c=cvec(ind_c+PROC_ID*m/NUM_PROC)
A(ind_a,ind_c)=c*ind_a
END DO
END DO
! Gather the elements at the root process
DO ind_a=1,n
CALL MPI_GATHER(A(ind_a,:),m/NUM_PROC,MPI_INTEGER,B(ind_a,PROC_ID*m/NUM_PROC+1:(PROC_ID+1)*m/NUM_PROC),m/NUM_PROC,MPI_INTEGER,MASTER_ID,MPI_COMM_WORLD,IERROR)
END DO
CALL MPI_FINALIZE(IERROR)
END PROGRAM
Upvotes: 1
Views: 289
Reputation: 98
There are two types of gather operation that can be performed in a 2 dimensional array. 1. gathering the elements from dimension-2 of all the process and collecting it in the dimension-2 of one process; and 2. gathering the elements from dimension-2 of all the process and collecting it in the dimension-1 of one process.
Said that in this example; n=dimension-1 and m=dimension-2, and we know that Fortran is column major. Hence, the dimension-1 is contiguous in memory in Fortran.
In your gather statement you are trying to gather dimension-2 of Array-A from all the processes, and collect it into the dimension-2 of Array-B in MASTER_ID proc(TYPE-1). Since, dimension-2 is non-contiguous in memory, this causes the segmentation fault.
A single MPI_Gather call as shown below will get to the required operation, without any looping-tricks as shown above:
CALL MPI_GATHER(A, n*(m/NUM_PROC), MPI_INTEGER, &
B, n*(m/NUM_PROC), MPI_INTEGER, MASTER_ID, &
MPI_COMM_WORLD, IERROR)
But, if you attempting to gather elements from dimension-2 of Array-A from all the process to dimension-1 of Array-B in MASTER_ID proc, that is when we need to make use of MPI_TYPE_VECTOR, where we create a new type with the non-contiguous elements. Let, me know if that is the intention.
Because, the current code logic doesn't look like we need to make use of MPI_TYPE_VECTOR.
Upvotes: 1