Reputation: 41
I've inherited a project and I'm trying to wrap my head around how everything works. I've checked all the basic gotchas I've encountered with deadline timers before.
io_service
.run
is called? ☑*** NEW DATA **************
It doesn't just go off in the weeds, it does come back and resume functioning like normally. But it takes somewhere on the order of 5 to 10 minutes. The timer itself seems to be based on the tick count timer example:
http://www.boost.org/doc/libs/1_47_0/doc/html/boost_asio/example/timers/tick_count_timer.cpp
At this point I'm wondering if something is wrong with the timer object itself. I've always used a normal deadline timer. I think it's using the tick_count_timer to be independent of the system clock which is usually slaved to a netclock somewhere.
It's not a case of the timer getting restarted, as there is a seperate path for that and I'm not seeing that printout a second time.
*** END NEW DATA *********
Needless to say this has me stumped.
Here's the method in question:
void project::on_heartbeatsend_timer_elapsed(const boost::system::error_code& e)
{
if (e != boost::asio::error::operation_aborted)
{
if (this->m_state!=session_state::SessionConnected)
{
stop("Heartbeat SEND lost connection");
#ifdef HEARTBEAT_DEBUG
cout << "Heartbeat SEND lost connection to target: "
<< GetIP().c_str()
<< ":"
<< GetPort()
<< endl;
#endif // HEARTBEAT_DEBUG
}
else
{
size_t size = 0;
packet_buffer_t b(construct_api_packet(SESSION_HEARTBEAT, 0, 0,NULL,size,&size));
write(b, size);
m_timerHeartbeatSend.cancel();
size_t cancelled_stuff = m_timerHeartbeatSend.expires_from_now(m_heartbeatFrequency*1000);
m_timerHeartbeatSend.async_wait(strand_.wrap(boost::bind(&project::on_heartbeatsend_timer_elapsed,
shared_from_this(),
boost::asio::placeholders::error)));
#ifdef HEARTBEAT_DEBUG
cout << "Heartbeat SEND to target "
<< GetIP().c_str()
<< ":"
<< GetPort()
<< " Next Heartbeat in: "
<< m_heartbeatFrequency
<< " seconds. cancelled "
<< cancelled_stuff
<< " pending operation."
<< endl;
#endif // HEARTBEAT_DEBUG
}
}
else if (e.value() != 995)
{
stringstream evalSS;
evalSS << e.value();
std::string error_str = std::string("Error in on_heartbeatsend_timer_elapsed: ") + e.message() + " (" + evalSS.str() + ")";
stop(error_str);
#ifdef HEARTBEAT_DEBUG
cout << "Heartbeat "
<< error_str.c_str()
<< " "
<< GetIP().c_str()
<< ":"
<< GetPort()
<< endl;
#endif // HEARTBEAT_DEBUG
}
else
{
#ifdef HEARTBEAT_DEBUG
cout << "Heartbeat Threading ending to target "
<< GetIP().c_str()
<< ":"
<< GetPort()
<< endl;
#endif // HEARTBEAT_DEBUG
}
#ifdef HEARTBEAT_DEBUG
cout << "finished processing this heartbeat for sending to target "
<< GetIP().c_str()
<< ":"
<< GetPort()
<< endl;
#endif
}
The output from the debugging looks like this:
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat SEND to target 127.0.0.1:50647 Next Heartbeat in: 5 seconds. cancelled 0 pending operation.
finished processing this heartbeat for sending to target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat SEND to target 127.0.0.1:50647 Next Heartbeat in: 5 seconds. cancelled 0 pending operation.
finished processing this heartbeat for sending to target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat SEND to target 127.0.0.1:50647 Next Heartbeat in: 5 seconds. cancelled 0 pending operation.
finished processing this heartbeat for sending to target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50647
The Heartbeat RECV timer STARTED message is being kicked off each time a heartbeat packet is received from a C# tester for the C# implementation of this API.
What I would like to do next is examine this further in the debugger. I'm using Visual Studio 2013, but I'm having a hard time making heads or tails of the m_timerHeartbeatSend object.
Added some more debugging that shows the pattern I'm seeing better. The time stamps are tickcounts.
Heartbeat SEND to target 127.0.0.1:50367 Next Heartbeat in: 5 seconds. cancelled
0 pending operation. old experation: 1672369 now expires at: 1677377 current ti
me: 1672377
finished processing this heartbeat for sending to target 127.0.0.1:50367
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1677307
Heartbeat SEND to target 127.0.0.1:50367 Next Heartbeat in: 5 seconds. cancelled
0 pending operation. old experation: 1677377 now expires at: 1682385 current ti
me: 1677385
finished processing this heartbeat for sending to target 127.0.0.1:50367
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1682299
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1687306
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1692314
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1697322
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1702314
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1707321
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1712329
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1717337
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1722344
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1727352
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1732344
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1737351
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1742359
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1747367
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1752374
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1757366
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1762374
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1767382
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1772389
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1777397
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1782405
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1787412
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1792404
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1797412
Heartbeat SEND to target 127.0.0.1:50367 Next Heartbeat in: 5 seconds. cancelled
0 pending operation. old experation: 1682385 now expires at: 1805001 current ti
me: 1800001
finished processing this heartbeat for sending to target 127.0.0.1:50367
Heartbeat RECV timer STARTED (handle_read_header) for target 127.0.0.1:50367 at
1802419
Heartbeat SEND to target 127.0.0.1:50367 Next Heartbeat in: 5 seconds. cancelled
0 pending operation. old experation: 1805001 now expires at: 1810009 current ti
me: 1805009
finished processing this heartbeat for sending to target 127.0.0.1:50367
But now I'm kind of stuck again. I feel like I need to somehow examine the internals of m_timerHeartBeatSend to go any further.
Upvotes: 1
Views: 151
Reputation: 41
Turns out the issue is in the time_traits struct that was defined for this timer. It was using unsigned math, and when the subtracted value was greater than the other value it screwed the timer up all over the place. Eventually it would recover. If you are seeing a similar issue in your code base make sure your traits allow for negative values.
Upvotes: 1