Reputation: 397
I am trying to create a simulation using boost library, but I encountered a problem on asynchronous communication of processes. In our case, there are 2 processes which sends/receives messages from/to each other (using isend and ireceive commands). If I wait for all send/receive commands to complete, then everything is OK. So, this is my working code:
boost::mpi::communicator* comm;
// Initialize MPI and etc.
...
std::vector<boost::mpi::request> sendRequests;
std::vector<boost::mpi::request> receiveRequests;
for(int i=0; i< 10; i++){
receiveRequests.push_back(comm->irecv(0, 3000, receivedMessage));
sendRequests.push_back(comm->isend(1, 3000, sentMessage));
boost::mpi::wait_all(receiveRequests.begin(), receiveRequests.end());
receiveRequests.clear();
}
However, I want to cancel receiving messages if it takes too much time. So, I try to test if the communication is completed or not, using test and cancel function. So, I modified my code just like below:
boost::mpi::communicator* comm;
// Initialize MPI and etc.
...
std::vector<boost::mpi::request> sendRequests;
std::vector<boost::mpi::request> receiveRequests;
for(int i=0; i< 10; i++){
receiveRequests.push_back(comm->irecv(0, 3000, receivedMessage));
sendRequests.push_back(comm->isend(1, 3000, sentMessage));
vector<boost::mpi::request>::iterator it = receiveRequests.begin();
while(it != receiveRequests.end()){
if(!((*it).test()))
(*it).cancel();
receiveRequests.erase(it);
}
}
Now, my program crashes and I get this error after the first iteration of the loop:
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_fill_insert
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >'
what(): MPI_Test: Message truncated, error stack:
PMPI_Test(168)....................: MPI_Test(request=0x13bba24, flag=0x7fff081a7bd4, status=0x7fff081a7ba0) failed
MPIR_Test_impl(63)................:
MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 3000 truncated; 670 bytes received but buffer size is 577
So, I'd like to know how to resolve this error.
Upvotes: 1
Views: 556
Reputation: 397
Finally, I figured it out. It was just because of the race condition between test and cancel methods. Since there are hundreds of message requests during the run-time, sometimes this situation occurs. After testing a request, the program cannot cancel it, because it has just finished (after the test method, but before the cancel method). That's why it occurs irregularly. So, I had to change the way what I wanted to do and remove the cancel method.
Upvotes: 0
Reputation: 393593
Where does it
come from? It's nowhere
Note that push_back could reallocate and this invalidates any pending iterators.
Also note that you need to conditionally increment it
in case you did the removal. The typical pattern is
it = receiveRequests.erase(it);
Update I see you have added information to the question. It should probably be:
vector<boost::mpi::request>::iterator it = receiveRequests.begin();
while(it != receiveRequests.end()){
if(!((*it).test()))
(*it).cancel();
it = receiveRequests.erase(it);
}
I'm not sure why you always erase every receive request. I'm assuming that's the intent
Upvotes: 1