Reputation: 6145
During development of out application we have in special encountered a really nasty bug. The symptom is quite simply that the process disappears. The logs just end abruptly, no crash dumps or anything can be found, no zombie processes exist. Dr.Watson haven't noticed anything leaving us without any trace.
The error is not simple to reproduce, it takes on average 3-4 hours to reproduce this error, doing the same actions repeatedly. So somewhere there is some kind of race condition. We have special functions handling both SEH and normal exceptions so none of these should go unnoticed.
The debugging must be done on a special computer, because it is running on very specialized hardware. So only remote debugging is available. And when remote debugging is connected C++ builder doesn't noticed that the application is missing, and crash and burns when we try to do any debugging on the non existent process.
We are using a great variety of technologies with this software:
So, as you understand, I do not have much to work with here. What I am doing now is that I am trying to narrow it down by logging in different places in the code to find if there is some particular point in the code the error occurs. I am also trying to remove as many aspects of the action I am performing to get the case as simple as possible. But this is a really complex operation and this process is taking a lot of time, and time is (as usual) a scarce resource.
I am wondering if anyone out there have good tips for me, either to the cause (in general what causes a process just to stop without any notification) or to techniques for debugging such an elusive failure?
Upvotes: 4
Views: 1832
Reputation: 26094
Try running it with a smaller heap. If the problem is due to the fact that you run out of memory, this will cause the crash to happen sooner.
Upvotes: 2
Reputation: 11438
When native code under Windows experiences a stack overflow (typically due to infinite recursion) the process sometimes disappears exactly as you describe. The standard error dialogs and exception handling require some stack space, and where there is none left they cannot run. (Later versions of Windows handle this better and should always raise an exception - Windows XP is not "later" under this definition.)
The easiest brute-force way to debug this is to write log messages at the entry (and maybe the exit) to each function. These messages have to go directly to a file, and if you have buffered output (eg. cout
or similar) you should flush it immediately each time. When you manage to cause the crash, you'll have close to a stack trace that can at least localise the issue.
Infinite recursion is not the only cause of a stack overflow (though it is the more common one). If very large variables (typically arrays with thousands/millions of elements) are allocated on the stack the same issue may occur. In particular, the alloca()
"function" can disguise the cause of this type of stack overflow.
If you run under a debugger and break/log on guard page exceptions you will be notified when the stack is expanding - let the exception be handled, since it is being used to commit more memory and may not actually be related to the issue.
The final non-stack-overflow cause of a disappearing process is a stray call to exit()
or ExitProcess()
. A full text search should be able to mostly rule this out - a breakpoint on the ExitProcess
function in a debugger will do so completely.
Upvotes: 7
Reputation: 257
Why dont you try windbg, it can also connect remotely via a named pipe or serial port.
NO BSOD, no Rootkit , no Fun ~~ Biswanth Chowdhury - Win32 Kernel*
Upvotes: 2
Reputation: 2155
If you want to be able to debug the scenario more often, try running this in a Virtual Machine and taking "snapshots" every so often before it happens.
The problem here could be inconsistency with the states of the specialized hardware you mention you have connected via serial port.
Upvotes: 1