Reputation: 21

Finding memory leak in python by tracemalloc module

I have a python script which uses an opensource pytorch model and this code has a memory leak. I am running this with memory_profiler mprof run --include-children python my_sctipt.py and get the following image: menory_profiler plot

I am trying to search for the reason of the leak by the system python module tracemalloc:

tracemalloc.start(25)
while True:
    ...
    snap = tracemalloc.take_snapshot()
    domain_filter = tracemalloc.DomainFilter(True, 0)
    snap = snap.filter_traces([domain_filter])
    stats = snap.statistics('lineno', True)
    for stat in stats[:10]:
        print(stat)

If looking only at tracemalloc output, I will not be able to identify the problem. I assume that the problem is in the C extension but, I would like to make sure it is true. I tried to change the domain by DomainFilter, but I have output only in 0 domain.

Also, I don't understand the meaning of the parameter which tracemalloc.start(frameno) has got, frameno is a number of the most recent frames, but nothing happens when I change it.

What can I do next to find the problematic place in the code which causes the memory leak?

Looking forward to your answer.

Upvotes: 2

Answers (1)

Tim Boddy

Reputation: 1069

Given that your guess is that the problem is in the C extension, but that you want to make sure this is true, I would suggest that you do so using a tool that is less python-specific like https://github.com/vmware/chap or at least if you are able to run your program on Linux.

What you will need to do is run your script (uninstrumented) and at some point gather a live core (for example, using "gcore pid-of-your-running-program").

Once you have that core, open that core in chap ("chap your-core-file-path") and try the following command from the chap prompt:

summarize writable

The output will be something like this, but your numbers will likely vary considerably:

chap> summarize writable
5 ranges take 0x2021000 bytes for use: stack
6 ranges take 0x180000 bytes for use: python arena
1 ranges take 0xe1000 bytes for use: libc malloc main arena pages
4 ranges take 0x84000 bytes for use: libc malloc heap
8 ranges take 0x80000 bytes for use: used by module
1 ranges take 0x31000 bytes for use: libc malloc mmapped allocation
4 ranges take 0x30000 bytes for use: unknown
29 writable ranges use 0x23e7000 (37,646,336) bytes.

The lines in the summary are given in decreasing order of byte usage, so you can follow that order. So looking at the top one first we see that the use is "stack":

5 ranges take 0x2021000 bytes for use: stack

This particular core was for a very simple python program that starts 4 extra threads and has all 5 threads sleep. The reason large stack allocations can happen rather easily with a multi-threaded python program is that python uses pthreads to create additional threads and pthreads uses the ulimit value for stack size as a default. If your program has a similarly large value, you can change the stack size in one of several ways, including running "ulimit -s" in the parent process to change the default stack size. To see what values actually make sense you can use the following command from the chap prompt:

chap> describe stacks
Thread 1 uses stack block [0x7fffe22bc000, 7fffe22dd000)
 current sp: 0x7fffe22daa00
Peak stack usage was 0x7798 bytes out of 0x21000 total.

Thread 2 uses stack block [0x7f51ec07c000, 7f51ec87c000)
 current sp: 0x7f51ec87a750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

Thread 3 uses stack block [0x7f51e7800000, 7f51e8000000)
 current sp: 0x7f51e7ffe750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

Thread 4 uses stack block [0x7f51e6fff000, 7f51e77ff000)
 current sp: 0x7f51e77fd750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

Thread 5 uses stack block [0x7f51e67fe000, 7f51e6ffe000)
 current sp: 0x7f51e6ffc750
Peak stack usage was 0x2178 bytes out of 0x800000 total.

5 stacks use 0x2021000 (33,689,600) bytes.

So what you see above is that 4 of the stacks are 8MiB in size but could easily be well under 64KiB.

Your program may not have any issues with stack size, but if so, you can fix them as described above.

Continuing with checking for causes of growth, look at the next line from the summary:

6 ranges take 0x180000 bytes for use: python arena

So python arenas use the next most memory. These are used strictly for python-specific allocations. So if this value is large in your case it disproves your theory about C allocations being the culprit, but there is more you can do later to figure out how those python allocations are being used.

Looking at the remaining lines of the summary, we see a few with "libc" as part of the "use" description:

1 ranges take 0xe1000 bytes for use: libc malloc main arena pages
4 ranges take 0x84000 bytes for use: libc malloc heap
1 ranges take 0x31000 bytes for use: libc malloc mmapped allocation

Note that libc is responsible for all that memory, but you can't know that the memory is used for non-python code because for allocations beyond a certain size threshold (well under 4K) python grabs memory via malloc rather than grabbing memory from one of the python arenas.

So lets assume that you have resolved any issues you might have had with stack usage and you have mainly "python arenas" or "libc malloc" related usages. The next thing you want to understand is whether that memory is mostly "used" (meaning allocated but never freed) or "free" (meaning "freed but not given back to the operating system). You can do that as shown:

chap> count used
15731 allocations use 0x239388 (2,331,528) bytes.
chap> count free
1563 allocations use 0xb84c8 (754,888) bytes.

So in the above case, used allocations dominate and what one should do is to try to understand those used allocations. The case where free allocations dominate is much more complex and is discussed a bit in the user guide but would take too much time to cover here.

So lets assume for now that used allocations are the main cause of growth in your case. We can find out why we have so many used allocations.

The first thing we might want to know is whether any allocations were actually "leaked" in the sense that they are no longer reachable. This excludes the case where the growth is due to container-based growth.

One does this as follows:

chap> summarize leaked
0 allocations use 0x0 (0) bytes.

So for this particular core, as is pretty common for python cores, nothing was leaked. Your number may be non-zero. If it is non-zero but still much lower than the totals associated with memory used for "python" or "libc" reported above, you might just make a note of the leaks but continue to look for the real cause of growth. The user guide has some information about investigating leaks but it is a bit sparse. If the leak count is actually big enough to explain your growth issue, you should investigate that next but if not, read on.

Now that you are assuming container-based growth the following commands are useful:

chap> redirect on
chap> summarize used
Wrote results to scratch/core.python_5_threads.summarize_used
chap> summarize used /sortby bytes
Wrote results to scratch/core.python_5_threads.summarize_used::sortby:bytes

The above created two text files, one which has a summary ordered in terms of object counts and another which has a summary in terms of the total bytes used directly by those objects.

At present chap has only very limited support for python (it finds those python objects, in addition to any allocated by libc malloc but for python objects the summary only breaks out limited categories for python objects in terms of patterns (for example, %SimplePythonObject matches things like "int", "str", ... that don't hold other python objects and %ContainerPythonObject matches things like tuple, list, dict, ... that do hold references to other python objects). With that said, it should be pretty easy to tell from the summary whether the growth in used allocations is primarily due to objects allocated by python or objects allocated by native code.

So in this case, given that you specifically are trying to find out whether the growth is due to native code or not, look in the summary for counts like the following, all of which are python-related:

Pattern %SimplePythonObject has 7798 instances taking 0x9e9e8(649,704) bytes.

Pattern %ContainerPythonObject has 7244 instances taking 0xc51a8(807,336) bytes.

Pattern %PyDictKeysObject has 213 instances taking 0xb6730(747,312) bytes.

So in the core I have been using for an example, definitely python allocations dominate.

You will also see a line for the following, which is for allocations that chap does not yet recognize. You can't make assumptions about whether these are python-related or not.

Unrecognized allocations have 474 instances taking 0x1e9b8(125,368) bytes.

This will hopefully answer your basic question of what you can do next. At least at that point you will understand whether the growth is likely due to C code or python code and depending on what you find, the chap user guide should help you go further from there.

Upvotes: 5

Finding memory leak in python by tracemalloc module

Answers (1)

Related Questions