Reputation: 19941
I know there already are a few threads on memory profiling with massif and other tools but I wonder if there are any tools or common techniques for runtime memory profiling in a production environment.
One can imagine an implementation where every class provides a memSize() function and containers are extended to do the same by calling memSize() on all their members and adding their own size (or size estimate). Then at any point in time you can query the app and see which of your major data structures are using using most of your memory and how that changes with time.
Unfortunately the above strategy can be tricky - you'd have to deal with things like locking, memory alignment, etc. and sometimes you won't know how big a third party data structure is and you'd have to guess. Overall it also seems like quite a lot of work to add that to all classes...
So to come to the actual question - what's a good way of monitoring the memory usage and causes for memory growth in runtime in a prod process?
Upvotes: 3
Views: 1094
Reputation: 67547
If you are willing to:
then you can keep track of what's allocated, sizes and owners. In our embedded device we have extended the memory manager to record additional info for each memory block that is allocated. In our case we keep track of the following:
We have a mechanism by which we can ask the system to walk the block list (a linked list) and dump the info above for each block to a .csv file. This can be triggered automatically when the system runs out of memory or when it detects memory corruption, and it can also be triggered manually at any time. Once the .csv file is generated we have a Perl script that digests it and groups the requests based on the originating thread, stack trace, etc. This is very handy, for example, it allows us to see how much memory and how many allocations came from a given place in the code.
A technique that we find extremely useful in finding leaks is to generate two or more .csv reports at different times while some process is running. Comparing the digested memory logs allows us to easily spot memory being leaked.
We've found that the overhead of adding this info is in the noise, so we have this functionality enabled in production systems, so that when a unit fails in the field we can collect the .csv files and perform post-mortem analysis.
Upvotes: 2