Reputation: 167
I'm writing this fairly big network simulator in C++. I've been regularly testing individual pieces as I was developing them, and after putting everything together it seems to work as long as the load I impose on the simulator is not too big (it's a P2P content distribution simulator, so the more different "contents" I introduce the more data transfers the simulator will have to handle). Anything above a certain threshold of the number of different contents being simulated will result in an abrupt SIGSEGV after several minutes of smooth running. I assumed there was a memory leak that was eventually becoming too large and messing things up, but a valgrind run with the parameters below the threshold terminated flawlessly. However, if I try to run the program with valgrind using a critical value for the content number, after a certain point I start to get memory access errors in functions that previously presented no problems:
==5987== Invalid read of size 8
==5987== at 0x40524E: Scheduler::advanceClock() (Scheduler.cpp:38)
==5987== by 0x45BA73: TestRun::execute() (TestRun.cpp:73)
==5987== by 0x45522B: main (CDSim.cpp:131)
==5987== Address 0x2e63bc70 is 0 bytes inside a block of size 32 free'd
==5987== at 0x4C2A4BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5987== by 0x405487: Scheduler::advanceClock() (Scheduler.cpp:69)
==5987== by 0x45BA73: TestRun::execute() (TestRun.cpp:73)
==5987== by 0x45522B: main (CDSim.cpp:131)
==5987==
==5987== Invalid read of size 4
==5987== at 0x40584E: Request::getSimTime() const (Event.hpp:45)
==5987== by 0x40525C: Scheduler::advanceClock() (Scheduler.cpp:38)
==5987== by 0x45BA73: TestRun::execute() (TestRun.cpp:73)
==5987== by 0x45522B: main (CDSim.cpp:131)
==5987== Address 0x2e63bc78 is 8 bytes inside a block of size 32 free'd
==5987== at 0x4C2A4BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5987== by 0x405487: Scheduler::advanceClock() (Scheduler.cpp:69)
==5987== by 0x45BA73: TestRun::execute() (TestRun.cpp:73)
==5987== by 0x45522B: main (CDSim.cpp:131)
==5987==
I know it might be hard to give an answer without seeing the whole code, but is there a "high-level" hint on what might be going on here? I don't understand why a function that seems to work normally suddenly starts misbehaving. Is there something obvious that I'm missing maybe?
The incriminated line in the previous valgrind log is if (nextEvent->getSimTime() < this->getSimTime())
in the following block:
bool Scheduler::advanceClock() {
if (pendingEvents.size() == 0) {
std::cerr << "WARNING: Scheduler::advanceClock() - Empty event queue before "
"reaching the termination event" << std::endl;
return false;
}
const Event* nextEvent = pendingEvents.top();
// Check that the event is not scheduled in the past
if (nextEvent->getSimTime() < this->getSimTime()) {
std::cerr << "Scheduler::advanceClock() - Event scheduled in the past!" <<
std::endl;
std::cerr << "Simulation time: " << this->getSimTime()
<< ", event time: " << nextEvent->getSimTime()
<< std::endl;
exit(ERR_EVENT_IN_THE_PAST);
}
// Update the clock with the current event time (>= previous time)
this->setSimTime(nextEvent->getSimTime());
...
where pendingEvents is a boost::heap::binomial_heap.
Upvotes: 0
Views: 196
Reputation: 167
I finally found what the problem was. When the event was completed and it needed to be removed from the list, my code went something like this:
...
// Data transfer completed, remove event from queue
// Notify the oracle, which will update the cache mapping and free resources
// in the topology
oracle->notifyCompletedFlow(nextEvent, this);
// Remove flow from top of the queue
pendingEvents.pop();
handleMap.erase(nextEvent);
delete nextEvent;
return true;
The problem was that oracle->notifyCompletedFlow()
invoked some methods on the scheduler to dynamically update the priority of scheduled events (e.g. to react to a change in the available bandwidth in the network), and thus by the time I removed the top of the queue with pendingEvents.pop()
in some cases I was popping a different event and leaving the deleted nextEvent in there. By popping the queue before invoking the oracle the problem sorted itself out.
I apologize for having left out pieces of code that might have led to a quicker answer, I'll try to learn from my mistake :) Thanks for pointing me in the right direction.
Upvotes: 1
Reputation: 126
It may be something with const Event* nextEvent = pendingEvents.top();
Looks like pendingEvents
is a kind of stack. You may try this:
Upvotes: 0