Firman
Firman

Reputation: 948

Memory leak when using lambda in Python class

I have detected a memory leak in Python if I use lambda function inside a class. Here's the code to reproduce the leak:

import torch
# import numpy as np

class Class1(object):
    def __init__(self, x0):
        self.x0 = x0
        self.obj2 = Class2()
        self._leak_fcn = lambda: self.obj2.fcn()  # source of memory leak!

class Class2(object):
    def fcn(self):
        pass

def fcn(x0):
    obj1 = Class1(x0)
    return x0

def test_fcn():
    shape = (50000000, 3)
    y0 = torch.randn(shape).to(torch.double)
    # y0 = np.random.randn(*shape)
    y = fcn(y0)
    return y

for i in range(1000):
    print(i)
    test_fcn()

The memory leak happens even if I changed it to numpy (without using pytorch). No memory leak is detected if comment the line containing self._leak_fcn or if I write the _leak_fcn as a method instead of a lambda. What is happening here?

I am not sure if this leak is from Python, or both PyTorch and NumPy are suffering the same leak. FYI: I am using Python 3.8.5.

EDIT: I know there is a memory leak here because if I run it for a long period, my memory is filled up (observed with htop) and the process is killed when it is run out of memory.

Upvotes: 2

Views: 2009

Answers (1)

Tim Boddy
Tim Boddy

Reputation: 1069

There is no leak here but simply a race as to whether the instances of Class1 get garbage collected soon enough to allow the torch or numpy buffers indirectly anchored by those Class1 buffers to be freed before the process no longer has enough memory to allocate another torch or numpy buffer. Garbage collection is needed to break the PyObject reference cycle mentioned by @ThierryLathuille.

The fact that a delay in garbage collection is the issue is provable by simply changing the source from the example to import from gc and to add a call to gc.collect() in the last loop of the program, because this makes it so that garbage collection will always happen soon enough (assuming you have enough memory available on the system that the program can make it even one time through the loop). To prove this, add "import gc" to the top of the program and also make the last loop look like this:

for i in range(1000):
    print(i)
    test_fcn()
    gc.collect()

You will then see that the program can run to completion (again, assuming you have enough memory to make it through the loop at least once).

The second thing one might want to confirm is that the garbage collection is simply not happening soon enough for the particular configuration but would happen eventually. This is certainly the case, and provably so. The way to do this is to reduce the memory used per numpy or torch buffer enough that the program does not run out of memory before garbage collection starts allowing some of those buffers to be freed, but not reduce it so much that the program could run to completion without any of those numpy or torch buffers being garbage collected at all.

To understand exactly what these numbers are, one needs understand how large the program will be allowed to grow. On Linux, one limit is the total available memory to be used as "committed memory", but the system may be configured to allow some over-commit to occur.

This can be roughly checked by looking at /proc/meminfo. On my system, subtracting Committed_AS from CommitLimit would suggest that if I run the program I will have less than 10 GB available to that program (assuming that other programs don't start or stop or change how much committed memory they use in the meantime).

$ grep Commit /proc/meminfo
CommitLimit:     9325344 kB
Committed_AS:     573964 kB

As already reported, the space used per torch or numpy buffer is roughly 1.2 GB (50,000,000 * 3 * 8) so even with some over-commit of memory allowed I would expect my program to store around 8 of those numpy or torch buffers before it crashes. It fact, on my system (using numpy rather than torch by starting with the original program from the question and commenting out the torch lines and removing the # from the numpy lines) it crashes at around 10 times through the loop:

$ python3 junk.py
0
1
2
3
4
5
6
7
8
9
10
Traceback (most recent call last):
  File "junk.py", line 29, in <module>
    test_fcn()
  File "junk.py", line 23, in test_fcn
    y0 = np.random.randn(*shape)
  File "mtrand.pyx", line 1233, in numpy.random.mtrand.RandomState.randn
  File "mtrand.pyx", line 1390, in  numpy.random.mtrand.RandomState.standard_normal
  File "_common.pyx", line 577, in numpy.random._common.cont
numpy.core._exceptions.MemoryError: Unable to allocate 1.12 GiB for an array with shape (50000000, 3) and data type float64

So suppose I make the following change in the program to the arguments to shape() to reduce the size of the numpy buffer by a factor of 25:

# shape = (50000000, 3)
shape = (2000000, 3)

Now I expect the numpy buffer to take at least 2,000,000 * 3 * 8 = 48,000,000 bytes. There is no way on my small system that the program could have 1,000 of those allocated but not yet freed (because that would take at least 48,000,000,000 bytes). However, the program runs to completion with the modified size, showing that garbage collection must be working.

The next question one might ask is how someone who was not able to spot the python reference cycle using just the source as Thierry apparently did, could figure this out by analysis.

One way to do this analysis to use chap (an open source tool that runs on Linux and for which source is available at https://github.com/vmware/chap).

The input required by chap is a core from the program to be analyzed. As seen above, on my system the program crashed after around 10 times through the loop so I chose to gather a live core for the program (using the original program with the minor commenting changes to use numpy) by running "gcore " after the program had run 8 or so times through the loop.

Here is the analysis using chap, starting from the point where chap has reached the chap prompt. We know the numpy buffers are large, so we can find them just by using a command that finds any buffers that are at least 0x1000000 bytes. Running that command shows that there were 7 such allocations at the time the core was gathered:

chap> describe used /minsize 1000000
Anchored allocation at 7fc6255b6010 of size 47868ff0

Anchored allocation at 7fc66ce1f010 of size 47868ff0

Anchored allocation at 7fc6b4688010 of size 47868ff0

Anchored allocation at 7fc6fbef1010 of size 47868ff0

Anchored allocation at 7fc74375a010 of size 47868ff0

Anchored allocation at 7fc78afc3010 of size 47868ff0

Anchored allocation at 7fc7d282c010 of size 47868ff0

7 allocations use 0x1f4adef90 (8,400,007,056) bytes.

If we pick one of those large allocations, we can see how it is referenced, to see why it is still in memory. One way to do this is to use the following command, which specifies that we should start from the given allocation, extend to any allocations that contain a pointer to the start (offset 0) of that allocation, then stop:

chap> describe allocation 7fc7d282c010 /extend @0<-=>StopHere
Anchored allocation at 7fc7d282c010 of size 47868ff0

Anchored allocation at 7fc82242c620 of size 50
This allocation matches pattern SimplePythonObject.
This has reference count 1 and python type 0x7fc8220a88e0 (numpy.ndarray)

2 allocations use 0x47869040 (1,200,001,088) bytes.

The result shows that there is just one such allocation and, not surprisingly given the source code, it is of type numpy.ndarray. It matches the chap pattern SimplePythonObject because allocations of type numpy.ndarray cannot reference other python objects (the big buffer is not one) and don't have garbage collection headers. Such an object will only be freed when the reference count transitions to 0. The key thing to observe here is that the reference count is 1, meaning we are just looking for one reference to that numpy.darray to understand why it is in memory.

Continuing from the numpy.darray we see that 7 things reference the start of that allocation. 6 of them are instances of python type frame, and are unlikely uninteresting because there are 6 of them and we also only want to explain one reference to the numpy.darray. There is also a single allocation that matches pattern PyDictValuesArray (because it holds the values for a split python dict) and this one is of more interest (remember to scroll through the output below):

chap> describe allocation 7fc82242c620 /extend @0<-=>StopHere
Anchored allocation at 7fc82242c620 of size 50
This allocation matches pattern SimplePythonObject.
This has reference count 1 and python type 0x7fc8220a88e0 (numpy.ndarray)

Anchored allocation at 562868cdae20 of size 268
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc82393aa00 (frame)

Anchored allocation at 562868e44ba0 of size 218
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc82393aa00 (frame)

Anchored allocation at 562868e48830 of size 238
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc82393aa00 (frame)

Anchored allocation at 562868e65b10 of size 238
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc82393aa00 (frame)

Anchored allocation at 7fc81a1f5030 of size 1f8
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc82393aa00 (frame)

Anchored allocation at 7fc81e5a3200 of size 1d0
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc82393aa00 (frame)

Anchored allocation at 7fc8223f9968 of size 28
This allocation matches pattern PyDictValuesArray.
It contains values for a split python dict.

8 allocations use 0xd30 (3,376) bytes.

An allocation that matches pattern PyDictValuesArray must be referenced by the ma_values field of a python dict. The allocation holding the values is not itself reference counted, but depends on being freed either when the dict is freed or when the dict no longer needs that allocation. Continuing from that allocation we can see the dict:

chap> describe allocation 7fc8223f9968 /extend @0<-=>StopHere
Anchored allocation at 7fc8223f9968 of size 28
This allocation matches pattern PyDictValuesArray.
It contains values for a split python dict.

Anchored allocation at 7fc823a646a8 of size 48
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc823939780 (dict)

2 allocations use 0x70 (112) bytes.

It is worth noting here that the dict has reference count 1 (so again only one reference needs to be explained), that the garbage collector is tracking this object and that the header for the actual dict, as opposed to the preceding garbage collection header starts at offset 0x18 to the dict. This means that for references to the dict we need to look for things that point to offset 0x18 of the allocation. (Links to offset 0 of the allocation would generally be used by garbage collection).

Using this information we can continue to see who references the dict and see that the dict is referenced by an instance of Class1.

chap> describe allocation 7fc823a646a8 /extend @18<-=>StopHere
Anchored allocation at 7fc823a646a8 of size 48
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc823939780 (dict)

Anchored allocation at 7fc822428d50 of size 38
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x562868afee88 (Class1)

2 allocations use 0x80 (128) bytes.

This is not surprising, given the program. The dict must be being used to hold the various fields of the instance of Class1. Again, note that the instance of Class1 has a reference count of 1, meaning we need to explain just one reference to understand why that instance of Class1 could still be in memory.

Continuing, again noticing that the allocation has a garbage collector, we can see that the Class1 instance is referenced by an instance of the python cell type, which again is being tracked for garbage collection and has reference count 1:

chap> describe allocation 7fc822428d50 /extend @18<-=>StopHere
Anchored allocation at 7fc822428d50 of size 38
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x562868afee88 (Class1)

Anchored allocation at 7fc823a1a870 of size 30
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc82393d6c0 (cell)

2 allocations use 0x68 (104) bytes.

Continuing, we can see that the cell is referenced by a tuple, which again has reference count 1:

chap> describe allocation 7fc823a1a870 /extend @18<-=>StopHere
Anchored allocation at 7fc823a1a870 of size 30
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc82393d6c0 (cell)

Anchored allocation at 7fc822428fb8 of size 38
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc823936320 (tuple)

2 allocations use 0x68 (104) bytes.

When checking for the tuple, it appears to be referenced by both a tuple and a function but the referencing tuple in this case does not appear to be of interest for our purposes because the garbage collector is not tracking it and also the reference count is 0. The function is definitely interesting because it is being tracked, because it has reference count 1, and because from the source we know that we are looking for such a thing:

chap> describe allocation 7fc822428fb8 /extend @18<-=>StopHere
Anchored allocation at 7fc822428fb8 of size 38
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc823936320 (tuple)

Anchored allocation at 562868aff290 of size 2b8
This allocation matches pattern ContainerPythonObject.
This allocation is not currently tracked by the garbage collector.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 0 and python type 0x7fc823936320 (tuple)
 
Anchored allocation at 7fc81a114938 of size 88
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc82393a860 (function)

3 allocations use 0x378 (888) bytes.

Continuing from the function, we complete the cycle because the allocation at 0x7fc8223f9968 that matches pattern PyDictKeysArray has already been seen earlier in our traversal.

chap> describe allocation 7fc81a114938 /extend @18<-=>StopHere
Anchored allocation at 7fc81a114938 of size 88
This allocation matches pattern ContainerPythonObject.
The garbage collector considers this allocation to be reachable.
This has a PyGC_Head at the start so the real PyObject is at offset 0x18.
This has reference count 1 and python type 0x7fc82393a860 (function)

Anchored allocation at 7fc8223f9968 of size 28
This allocation matches pattern PyDictValuesArray.
It contains values for a split python dict.
 
2 allocations use 0xb0 (176) bytes.

So we have a cycle that looks like this:

Class1 -> dict -> %PyDictValuesArray -> function -> tuple -> cell -> 

The allocations in the cycle itself do not take all that much memory, but it does require garbage collection to free it, and the %PyDictValuesArray in that cycle also holds the numpy.darray which holds the big buffer.

People including the author of the question and several people in the comments have discussed valid solutions to avoid the growth, so I will avoid any discussion of the fix itself and limit this answer to the above analysis of the reason for the growth.

Upvotes: 1

Related Questions