Reputation: 423
$ python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def mem():
... with open("/proc/{}/status".format(os.getpid())) as f:
... for line in f:
... if 'VmRSS' in line:
... return line.strip()
...
>>> import gc
>>> import numpy
>>> import os
>>> numpy.version.version
'1.16.4'
>>> print(mem())
VmRSS: 27000 kB
>>> a = [numpy.random.random(size=(128, 128)) for _ in range(5000)]
>>> print(mem())
VmRSS: 668876 kB
>>> gc.collect()
0
>>> print(mem())
VmRSS: 668876 kB
>>> a = None
>>> print(mem())
VmRSS: 455432 kB
>>> gc.collect()
0
>>> print(mem())
VmRSS: 455432 kB
>>> del a
>>> print(mem())
VmRSS: 455432 kB
>>> gc.collect()
0
>>> print(mem())
VmRSS: 455432 kB
In the above snippet I allocate about 600MB of medium sized numpy arrays (this behaviour doesn't occur when the arrays are significantly smaller or larger, and it doesn't occur if you only use Python objects), but when I then deallocate the array it still hangs on to over two-thirds of the memory and no amount of forced garbage collection or deletion will return that memory back to the OS.
I'm fairly sure this isn't a memory leak in numpy because new numpy allocations will reuse that memory (although Python allocations will not), so can anyone shed light why this happens?
Edit: It looks like this is at least partly to do with the system allocator:
$ LD_PRELOAD=/tmp/tmp.Rl0Ofo69sZ/jemalloc-5.2.1/lib/libjemalloc.so python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def mem():
... with open("/proc/{}/status".format(os.getpid())) as f:
... for line in f:
... if 'VmRSS' in line:
... return line.strip()
...
>>> import numpy
>>> import os
>>> print(mem())
VmRSS: 33040 kB
>>> a = [numpy.random.random(size=(128, 128)) for _ in range(5000)]
>>> print(mem())
VmRSS: 694912 kB
>>> a = None
>>> print(mem())
VmRSS: 159568 kB
I'm currently putting the rest down to just boring old memory fragmentation, but would be interesting to know if there's anything else going on here (numpy does seem to do some basic caching of it's own).
Upvotes: 3
Views: 393
Reputation: 250
Numpy is a C extension that manages its own memory, so Python's garbage collector is not involved. Numpy will allocate space for its arrays on the heap using malloc() or calloc() and release that space with free() when it is done (this happens when you set a = None
). The heap allocator doesn't necessarily release the memory back to the OS when it's free()'d however, and whether that happens or not depends on how the memory was obtained from the OS and on heap fragmentation, etc. But that memory can still be reused by the process, as you have observed.
Upvotes: 5