Rade
Rade

Reputation: 1

Ridiculously simple MPI_Send/Recv problem I don't understand

I have two functions with different algorithms. In the first function I implemented non-blocking communications (MPI_Irecv, MPI_Isend) and the program runs without any errors. Even when I change the non-blocking to blocking communication, everything is fine. No deadlock. But if I implement the second function with basic blocking communication like this (reduced the algorithm to the problem):

 if( my_rank == 0)
    {
      a = 3 ;
      MPI_Send(&a,1,MPI_DOUBLE,1,0,MPI_COMM_WORLD) ;
    }

    else if( my_rank == 1 )
    {
      MPI_Recv(&a,1,MPI_DOUBLE,0,0,MPI_COMM_WORLD, &status ) ;
    }

So, process 1 should receive the value a from process 0. But I'm getting this error:

Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(187).......................: MPI_Recv(buf=0xbfbef2a8, count=1, MPI_DOUBLE, src=0, tag=0, MPI_COMM_WORLD, status=0xbfbef294) failed MPIDI_CH3U_Request_unpack_uebuf(600): Message truncated; 32 bytes received but buffer size is 8 rank 2 in job 39 Blabla caused collective abort of all ranks exit status of rank 2: killed by signal 9

If I run the program with only one of the two functions, then they work as they are supposed to. But both together results in the error message above. I do understand the error message, but I don't know what I can do to prevent it. Can someone explain to me where I have to look for the error? Since I'm not getting a deadlock in the first function, I'm assuming that there can't be a unreceived send from the first function which leads to the error in the second.

Upvotes: 0

Views: 4731

Answers (1)

Rade
Rade

Reputation: 1

So, here is the the first function:

MPI_Type_vector(m,1,m,MPI_DOUBLE, &column_mpi_t ) ;
MPI_Type_commit(&column_mpi_t) ;

T = (double**)malloc(m*sizeof(double*)) ;
T_data = (double*)malloc(m*m*sizeof(double)) ;


for(i=0;i<m;i++)
{
  T[i] = &(T_data[i*m]) ;
}

if(my_rank==0)
{
  s = &(T[0][0]) ;
  for(i=1;i<p;i++)
  {
    MPI_Send(s,1,column_mpi_t,i,0,MPI_COMM_WORLD) ;
  }
}
for(k=0;k<m-1;k++)
{
  if(k%p != my_rank)
  {
    rbuffer = &(T[0][k]) ;
    MPI_Recv(rbuffer,1,column_mpi_t,k%p,0,MPI_COMM_WORLD,&status) ;
  }

  for(j=k+1;j<n;j++)
  {
    if(j%p==my_rank)
    {
      if(j==k+1 && j!=n-1)
      {
        sbuffer = &(T[0][k+1]) ;
        for(i=0;i<p;i++)
        {
          if(i!= (k+1)%p )
            MPI_Send(sbuffer,1,column_mpi_t,i,0,MPI_COMM_WORLD) ;
        }
      }         
    }
  }
}

I came to the conclusion that the derived datatype is the origin of my problems. Somebody sees why?

Ok, im wrong. If i change the MPI datatype in MPI_Irecv/send to MPI_DOUBLE,that would fit to the datatypes of recv/send of the second function ..so no truncation error. So, no solution....

Upvotes: 0

Related Questions