Adomas Baliuka
Adomas Baliuka

Reputation: 1602

Make a Python memory leak on purpose

I'm looking for an example that purposely makes a memory leak in Python.

It should be as short and simple as possible and ideally not use non-standard dependencies (that could simply do the memory leak in C code) or multi-threading/processing.

I've seen memory leaks achieved before but only when bad things were being done to libraries such as matplotlib. Also, there are many questions about how to find and fix memory leaks in Python, but they all seem to be big programs with lots of external dependencies.

The reason for asking this is about how good Python's GC really is. I know it detects reference cycles. However, can it be tricked? Is there some way to leak memory? It may be impossible to solve the most restrictive version of this problem. In that case, I'm very happy to see a rigorous argument why. Ideally, the answer should refer to the actual implementation and not just state that "an ideal garbage collector would be ideal and disallow memory leaks".

For nitpicking purposes: An ideal solution to the problem would be a program like this:

# Use Python version at least v3.10
# May use imports.
# Bonus points for only standard library.
# If the problem is unsolvable otherwise (please argue that it is),
# then you may use e.g. Numpy, Scipy, Pandas. Minus points for Matplotlib.

def memleak():
    # do whatever you want but only within this function
    # No global variables!
    # Bonus points for no observable side-effects (besides memory use)
    # ...

for _ in range(100):
    memleak()

The function must return and be called multiple times. Goals in order of bonus points (high number = many bonus points)

  1. the program keeps using more memory, until it crashes.
  2. after calling the function multiple times (e.g. the 100 specified above), the program may continue doing other (normal) things such that the memory leaked during the function is never freed.
  3. Like 2 but the memory cannot be freed, even by by calling gc manually and similar means.

Upvotes: 2

Views: 1792

Answers (1)

Brian61354270
Brian61354270

Reputation: 14423

One way to "trick" CPython's garbage collector into leaking memory is by invalidating an object's reference count. We can do this by creating an extraneous strong reference that never gets deleted.

To create a new strong reference, we need to invoke Py_IncRef (or Py_NewRef) from Python's C API. This can be done via ctypes.pythonapi:

import ctypes
import sys

# Create C API callable
inc_ref = ctypes.pythonapi.Py_IncRef
inc_ref.argtypes = [ctypes.py_object]
inc_ref.restype = None

# Create an arbitrary object.
obj = object()

# Print the number of references to obj.
# This should be 2:
#  - one for the global variable 'obj'
#  - one for the argument inside of 'sys.getrefcount'
print(sys.getrefcount(obj))

# Create a new strong reference.
inc_ref(obj)

# Print the number of references to obj.
# This should be 3 now.
print(sys.getrefcount(obj))

outputs

2
3

Concretely, you can write your memleak function as

import ctypes

def memleak():
    # Create C api callable
    inc_ref = ctypes.pythonapi.Py_IncRef
    inc_ref.argtypes = [ctypes.py_object]
    inc_ref.restype = None

    # Allocate a large object
    obj = list(range(10_000_000))

    # Increment its ref count
    inc_ref(obj)

    # obj will have a dangling reference after this function exits

memleak()  # leaks memory

An object with a dangling strong reference will never be freed by reference counting, and won't be detected as an unreachable object by the optional garbage collector. Running gc manually via

gc.collect()

will have not effect.

Upvotes: 1

Related Questions