Ayaz Khan
Ayaz Khan

Reputation: 31

MPI - Message Truncation in MPI_Recv

I am having problems in one my project related to MPI development. I am working on the implementation of an RNA parsing algorithm using MPI in which I started the parsing of an input string based on some parsing rules and parsing table (contains different states and related actions) with a master node. In parsing table, there are multiple actions for each state which can be done in parallel. So, I have to distribute these actions among different processes. To do that, I am sending the current state and parsing info (current stack of parsing) to the nodes using separate thread to receive actions from other nodes while the main thread is busy in parsing based on received actions. Following are the code snippets of the sender and receiver:

Sender Code:

StackFlush(&snd_stack);
StackPush(&snd_stack, state_index);
StackPush(&snd_stack, current_ch);
StackPush(&snd_stack, actions_to_skip);
elements_in_stack = stack.top + 1;
for(int a=elements_in_stack-1;a>=0;a--)
                StackPush(&snd_stack, stack.contents[a]);
StackPush(&snd_stack, elements_in_stack);
elements_in_stack = parse_tree.top + 1;
for(int a=elements_in_stack-1;a>=0;a--)
                StackPush(&snd_stack, parse_tree.contents[a]);
StackPush(&snd_stack, elements_in_stack);
elements_in_stack = snd_stack.top+1;
MPI_Send(&elements_in_stack, 1, MPI_INT, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD);
MPI_Send(&snd_stack.contents[0], elements_in_stack, MPI_CHAR, (myrank + actions_to_skip) % mysize, MSG_ACTION_STACK, MPI_COMM_WORLD);

Receiver Code:

MPI_Recv(&e_count, 1, MPI_INT, MPI_ANY_SOURCE, MSG_ACTION_STACK_COUNT, MPI_COMM_WORLD, &status);
if(e_count == 0){
                break;
}
while((bt_stack.top + e_count) >= bt_stack.maxSize - 1){usleep(500);}
pthread_mutex_lock(&mutex_bt_stack); //using mutex for accessing shared data among threads
MPI_Recv(&bt_stack.contents[bt_stack.top + 1], e_count, MPI_CHAR, status.MPI_SOURCE, MSG_ACTION_STACK, MPI_COMM_WORLD, &status);
bt_stack.top += e_count;
pthread_mutex_unlock(&mutex_bt_stack);

The program is running fine for small input having less communications but as we increase the input size which in response increases the communication so the receiver receives many requests while processing few then it get crashed with the following errors:

Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186) ……………………………………: MPI_Recv(buf=0x5b8d7b1, count=19, MPI_CHAR, src=3, tag=1, MPI_COMM_WORLD, status=0x41732100) failed MPIDI_CH3U_Request_unpack_uebuf(625)L Message truncated; 21 bytes received but buffer size is 19 Rank 0 in job 73 hpc081_56549 caused collective abort of all ranks exit status of rank 0: killed by signal 9.

I have also tried this by using Non-Blocking MPI calls but still the similar errors.

Upvotes: 3

Views: 10140

Answers (1)

Greg Inozemtsev
Greg Inozemtsev

Reputation: 4671

I don't know what the rest of the code looks like, but here's an idea. Since there is a break I'm assuming the receiver code is part of a loop or a switch statement. If that's the case, there is a mismatch between sends and receives when the element count becomes 0:

  1. The sender will send the element count and a zero-length message (the MPI_Send(&snd_stack.contents... line).
  2. There will be no matching receive for this second message because the receiver breaks out of the loop.
  3. The zero-length message will then match something else, possibly causing the error you are seeing down the line.

Upvotes: 1

Related Questions