matthias_buehlmann
matthias_buehlmann

Reputation: 5071

Heap corruption caused by race condition - does not happen when application is slowed down. How to Debug?

We are experiencing a crash in a Windows C++ application right on startup. The crash happens currently only on our win 8.1 machine (other development machines being windows 7) and only happen on release builds. The stack trace is each time a bit different, but always related to memory alloc, so it's likely a heap corruption problem.

The problem is that, as soon as the application is slowed down a bit, the crash does not occur:

In all these cases where the crash does not happen, the application is slightly slowed down and especially startup does take a bit longer, so my assumption is that it's a heap corruption caused by a race condition.

If we can't use the CRT debug heap or tools that slow down the app execution (because it does not crash then), what are good approaches to tracing down the circumstance under which the heap corrupts?

Upvotes: 0

Views: 924

Answers (1)

dmi
dmi

Reputation: 1479

The behavior you described might signal your SW has an issue with dynamic memory which is timing sensetive. I would recommend only code review with the focus on the variables using dynamic data allocation or references to dynamically allocated data. In particular, containers from stl, any other objects allocated via new/malloc or similar. Might be in the first turn you can find all such variables which are shared between different threads and analyze whether:

  1. The variables are initialized before the first use.
  2. The life time of the objects is longed than their use. For the data this means, it shall not be used before it is allocated and after it is deallocated.
  3. The variables are protected against simultaneous read/write from different threads.
  4. Logical read-write sequence ensures that reading of the variable is safe in the case the variable is not yet written by anyone.

If nothing found, perform then some static code analysis (i.e. LINT or similar) and analyze all compiller warnings if you have any.

Updated: just one more possibility, you can redefine your own memory allocators to add some head and tail guard areas to the allocated memory and on every call monitor whether the patterns are not corrupted. Once it happens you may at least dump the data, and together with the callstack identify the place in the SW firstly affected by the corruption. Then the analysis scope will be much reduced. But don't forget this might also change the timings so, that the corruption won't happen.

Upvotes: 1

Related Questions