Reputation: 27

How do I troubleshoot an illegal memory access crash that only occurs on a client's system?

I am attempting to troubleshoot a problem with our application that only occurs on a particular server belonging to one of our customers.

The application sometimes crashes, and the core files are showing an illegal memory access. I suspect the reason for that is some kind of failure with the malloc function. It is probably returning a NULL pointer, but when this occurs the machine still has plenty of free memory. My theory is that the memory was too fragmented, and when it tried to allocate some more memory (18MB), it may have failed.

What steps can I take to troubleshoot this problem? For example, does Windows log any information when a memory allocation fails? Or does it just ignore it?

The server in question is running Windows Server 2008 R2 and the Windows Event Log service is running.

At this point I can't include any code, because I don't know what part of the application is causing the problem. How can I narrow this down?

Upvotes: 0

Answers (2)

Christopher Pisz

Reputation: 4010

No. The Windows Even Log is something you'd have to setup and use in your code, usually for a windows service.

Please show a code snippet demonstrating how you are allocating memory.

It should be practice to error check a call to malloc by checking if the value returned is NULL.

If you are not sure which call to malloc is failing, you're best bet is to invest in a good profiler. I've had a good amount of success using Intel Parallel Studio, but it isn't cheap. Also keep in mind that every profiler I've ever tried fails to work over COM boundries.

"Illegal memory access" is not necessarily a failure to allocate memory. It could be all manner of things. You need to break down your software into testable units and pinpoint the problem before worrying about how to resolve it.

Edit (after question revision): You are really limiting yourself with the constraint "We cannot alter the code"

You should begin by doing a search in every file for malloc or new and assure the result is checked.

You also have the option of turning optimization off, exporting symbols, creating a build, installing debugging tools, and remotely debugging while stepping through the code to narrow down where the trouble is. However, that is probably only an option if you have a list of steps to reproduce. Usually these kinds of memory problems are random in nature due to a bug in the code. It could just show up in one build and not another, but the bug is still there.

You can also profile remotely, but profiling an entire application or service yields near unusable results. Software should be broken down into unit testable parts and in turn into integration testable parts. If it was not, this is the price you pay (even if it wasn't your fault).

Upvotes: 1

Χpẘ

Reputation: 3451

This is a classic debug situation. You have an immediate failure (illegal memory access) and you need to work back to a root cause. If you were debugging in assembly you'd most likely see a register with an invalid memory ptr that was being used to access memory. After identifying the register you'd work your way backward to see where that register got its value from.

If it got its value from a memory allocation call then your theory might be right.

If it got its value from another register or memory location for which you don't know what its value should be, then you have an "intermediate cause". The second register or memory location was the cause of the first register having an invalid memory ptr.

You keep working your way back through intermediate causes until you find the root cause - something that is broken that someone could fix. You may have to go up the call stack a long way to find the next intermediate cause or go a long way back in a particular function. If you're unlucky the root cause may be a memory overwrite or a race condition or something else that gets in the way of what otherwise is mostly a deductive process.

If you can do source debugging (probably not if you have a third party app), you can avoid dealing with assembly language.

BTW, if you do have a third party app, chances are good there won't be anything you can do to fix the problem on your own, even if you do find a root cause. You'll likely need an update from the software vendor.

If the software is open source you do have more options. You can download the source, fix errors, and rebuild. Or you can push a fix back into the OSS project.

Upvotes: 1

How do I troubleshoot an illegal memory access crash that only occurs on a client&#39;s system?

Answers (2)

Related Questions

How do I troubleshoot an illegal memory access crash that only occurs on a client's system?