Reputation: 5071
We are experiencing a crash in a Windows C++ application right on startup. The crash happens currently only on our win 8.1 machine (other development machines being windows 7) and only happen on release builds. The stack trace is each time a bit different, but always related to memory alloc, so it's likely a heap corruption problem.
The problem is that, as soon as the application is slowed down a bit, the crash does not occur:
Debug builds do not crash.
If the release build application is linked against the debug crt (static or dynamic), the crash does not occur, so the CRT debug heap can't be used to track the problem.
If Application Verifier is hooked to the application and 'heap' tests are selected,the application does not crash.
Running the application through "Dr.Memory" also causes the crash to not happen.
In all these cases where the crash does not happen, the application is slightly slowed down and especially startup does take a bit longer, so my assumption is that it's a heap corruption caused by a race condition.
If we can't use the CRT debug heap or tools that slow down the app execution (because it does not crash then), what are good approaches to tracing down the circumstance under which the heap corrupts?
Upvotes: 0
Views: 924
Reputation: 1479
The behavior you described might signal your SW has an issue with dynamic memory which is timing sensetive. I would recommend only code review with the focus on the variables using dynamic data allocation or references to dynamically allocated data. In particular, containers from stl, any other objects allocated via new/malloc or similar. Might be in the first turn you can find all such variables which are shared between different threads and analyze whether:
If nothing found, perform then some static code analysis (i.e. LINT or similar) and analyze all compiller warnings if you have any.
Updated: just one more possibility, you can redefine your own memory allocators to add some head and tail guard areas to the allocated memory and on every call monitor whether the patterns are not corrupted. Once it happens you may at least dump the data, and together with the callstack identify the place in the SW firstly affected by the corruption. Then the analysis scope will be much reduced. But don't forget this might also change the timings so, that the corruption won't happen.
Upvotes: 1