Reputation: 1
I have two functions with different algorithms. In the first function I implemented non-blocking communications (MPI_Irecv, MPI_Isend) and the program runs without any errors. Even when I change the non-blocking to blocking communication, everything is fine. No deadlock. But if I implement the second function with basic blocking communication like this (reduced the algorithm to the problem):
if( my_rank == 0)
{
a = 3 ;
MPI_Send(&a,1,MPI_DOUBLE,1,0,MPI_COMM_WORLD) ;
}
else if( my_rank == 1 )
{
MPI_Recv(&a,1,MPI_DOUBLE,0,0,MPI_COMM_WORLD, &status ) ;
}
So, process 1 should receive the value a from process 0. But I'm getting this error:
Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(187).......................: MPI_Recv(buf=0xbfbef2a8, count=1, MPI_DOUBLE, src=0, tag=0, MPI_COMM_WORLD, status=0xbfbef294) failed MPIDI_CH3U_Request_unpack_uebuf(600): Message truncated; 32 bytes received but buffer size is 8 rank 2 in job 39 Blabla caused collective abort of all ranks exit status of rank 2: killed by signal 9
If I run the program with only one of the two functions, then they work as they are supposed to. But both together results in the error message above. I do understand the error message, but I don't know what I can do to prevent it. Can someone explain to me where I have to look for the error? Since I'm not getting a deadlock in the first function, I'm assuming that there can't be a unreceived send from the first function which leads to the error in the second.
Upvotes: 0
Views: 4731
Reputation: 1
So, here is the the first function:
MPI_Type_vector(m,1,m,MPI_DOUBLE, &column_mpi_t ) ;
MPI_Type_commit(&column_mpi_t) ;
T = (double**)malloc(m*sizeof(double*)) ;
T_data = (double*)malloc(m*m*sizeof(double)) ;
for(i=0;i<m;i++)
{
T[i] = &(T_data[i*m]) ;
}
if(my_rank==0)
{
s = &(T[0][0]) ;
for(i=1;i<p;i++)
{
MPI_Send(s,1,column_mpi_t,i,0,MPI_COMM_WORLD) ;
}
}
for(k=0;k<m-1;k++)
{
if(k%p != my_rank)
{
rbuffer = &(T[0][k]) ;
MPI_Recv(rbuffer,1,column_mpi_t,k%p,0,MPI_COMM_WORLD,&status) ;
}
for(j=k+1;j<n;j++)
{
if(j%p==my_rank)
{
if(j==k+1 && j!=n-1)
{
sbuffer = &(T[0][k+1]) ;
for(i=0;i<p;i++)
{
if(i!= (k+1)%p )
MPI_Send(sbuffer,1,column_mpi_t,i,0,MPI_COMM_WORLD) ;
}
}
}
}
}
I came to the conclusion that the derived datatype is the origin of my problems. Somebody sees why?
Ok, im wrong. If i change the MPI datatype in MPI_Irecv/send to MPI_DOUBLE,that would fit to the datatypes of recv/send of the second function ..so no truncation error. So, no solution....
Upvotes: 0