Reputation: 518
I am getting a crash due to memory leak (but it's 7 layers deep down, where it merely walks over the linked list - no allocations there).
It is fairly reproducible, almost on a daily basis, so I can always get a fresh core file. I spent last 3-5 days going over the code, pairing allocation/deallocation but cannot seem to find the place that would cause it, as the legacy C application is huge and it's full of memcpy/alloc/calloc all over the place. Frankly, one wrong memcpy is all it might take.
I went through the effort of compiling Valgrind locally, looking forward to get some nice tracing where it started, but Valgrind just makes the machine inoperable, e.g. it has to be restarted in the server room manually, as even ssh cannot be used. We basically lost two days of debugging due to Valgrind, so I cannot use it third time (unless Memcheck could somehow work with core files, perhaps?)
Is there some other tool that could help me analyze the core file for memory leaks ? gdb with print command is not exactly helpful.
To be more specific, some core files are really huge - 1.5GB (while they should not be over 0.3GB), so I was hoping to get a list of top 2-3 offenders that occupy most memory (which would give me direct hint as to where look next).
Any ideas ?
Oh, and as for the stability - it can properly handle about a million (or so) of the data requests, before it crashes (sometimes couple millions), so just putting a breakpoint in place where it usually crashes is out of the question.
Upvotes: 0
Views: 8366
Reputation: 6771
As a core file contains the raw memory dump of the process (embedded in an ELF data structure, which you can pretty much ignore here), you might be able to look at the bulk of the data in the core file, and watch out for repeating patterns and for familiar data (like strings). This is described pretty well in https://stackoverflow.com/a/8714719/2148773 .
Upvotes: 0
Reputation:
I think you're mixing up memory leaks and memory corruption.
If you have a memory leak, eventually a call to malloc()
should return NULL, and your program should have code to detect and log that. Unfortunately, it's more likely that malloc()
will succeed, but using the memory will cause the OS to OOM-kill your process, which is more difficult to debug. Oh well.
If you have memory corruption (possibly via memcpy()
, which will not cause a memory leak), a call to any of the C memory allocation routines may cause the C library to detect the heap corruption and suicide your application. This should come with a diagnostic like "heap corruption detected/invalid next block size" or similar.
The advantage of memory corruption over memory leaks is that an out-of-bounds read/write is unambiguously a bug, while leaking memory can be more subtle.
If valgrind is too slow, a memory corruption can instead be found using AddressSanitizer
, which has a much lower overhead.
Upvotes: 0
Reputation: 7698
I'd try creating a test set of inputs that brings the system up, runs a number of transactions, and then brings it down in a controlled (i.e. everything should be cleaned up) manner. Run that small suite under valgrind and it should at least give you stuff to chase. If it is an older system, you are likely to have false positives to chase. If you haven't found it by then, you will need to come up with more diverse tests.
BTW, when running the smaller tests, you can limit your process size (ulimit/limit) to avoid the massive memory images and associated system stability issues.
Upvotes: 1