Reputation: 41

What is the memory cost of multiple duplicate lambda functions in Python?

I'm wondering how the Python interpreter actually handles lambda functions in memory.

If I have the following:

def squares():
    return [(lambda x: x**2)(num) for num in range(1000)]

Will this create 1000 instances of the lambda function in memory, or is Python smart enough to know that each of these 1000 lambdas is the same, and therefore store them as one function in memory?

Upvotes: 4

Answers (2)

gilch

Reputation: 11651

TL;DR: The memory cost of the lambda objects in your example is the size of one lambda, but only while the squares() function is running, even if you hold a reference to its return value, because the returned list contains no lambda objects.

But even in cases where you do keep more than one function instance created from the same lambda expression (or def statement for that matter), they share the same code object, so the memory cost for each additional instance is less than the cost of the first instance.

In your example,

[(lambda x: x**2)(num) for num in range(1000)]

you're only storing the result of the lambda invocation in the list, not the lambda itself, so the lambda object's memory will be freed.

When exactly the lambda objects get garbage collected depends on your Python implementation. CPython should be able to do it immediately because the reference count drops to 0 each loop:

>>> class PrintsOnDel:
...     def __del__(self):
...       print('del')  # We can see when this gets collected.
...
>>> [[PrintsOnDel(), print(x)][-1] for x in [1, 2, 3]]  # Freed each loop.
1
del
2
del
3
del
[None, None, None]

PyPy is another story.

>>>> from __future__ import print_function
>>>> class PrintsOnDel:
....   def __del__(self):
....     print('del')
....
>>>> [[PrintsOnDel(), print(x)][-1] for x in [1, 2, 3]]
1
2
3
[None, None, None]
>>>> import gc
>>>> gc.collect()  # Not freed until the gc actually runs!
del
del
del
0

It will create 1000 different lambda instances over time, but they won't all be in memory at once (in CPython) and they'll all point to the same code object, so having multiple instances of a function is not as bad as it sounds:

>>> a, b = [lambda x: x**2 for x in [1, 2]]
>>> a is b  # Different lambda objects...
False
>>> a.__code__ is b.__code__  # ...point to the same code object.
True

Disassembling the bytecode can help you to understand exactly what the interpreter is doing:

>>> from dis import dis
>>> dis("[(lambda x: x**2)(num) for num in range(1000)]")
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x000001D11D066870, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (range)
              8 LOAD_CONST               2 (1000)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x000001D11D066870, file "<dis>", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (num)
              8 LOAD_CONST               0 (<code object <lambda> at 0x000001D11D0667C0, file "<dis>", line 1>)
             10 LOAD_CONST               1 ('<listcomp>.<lambda>')
             12 MAKE_FUNCTION            0
             14 LOAD_FAST                1 (num)
             16 CALL_FUNCTION            1
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE

Disassembly of <code object <lambda> at 0x000001D11D0667C0, file "<dis>", line 1>:
  1           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 (2)
              4 BINARY_POWER
              6 RETURN_VALUE

Note the 12 MAKE_FUNCTION instruction each loop. It does make a new lambda instance every time. CPython's VM is a stack machine. Arguments get pushed to the stack by other instructions, and then consumed by later instructions that need them. Notice that above that MAKE_FUNCTION instruction is another one pushing an argument for it.

LOAD_CONST               0 (<code object <lambda>...

So it re-uses the code object.

Upvotes: 1

Massifox

Reputation: 4487

Your code create a single lambda function and calls it 1000 times, without creating a new object at each iteration. Because at each iteration you store the result of the lambda function not the function itself. It is equivalent to:

def square(x): return x*x # or def square x = lambda x: x*x
[square(x) for x in range(1000)]

Instead in this way we are going to create a lambda function object for each iteration. See this dummy example:

[lambda x: x*x for _ in range(3)]

gives:

[<function <listcomp>.<lambda> at 0x294358950>, 
 <function <listcomp>.<lambda> at 0x294358c80>, 
 <function <listcomp>.<lambda> at 0x294358378>]

The memory addresses of the lambdas, they are all different. Then a different object is created for each lambda.

Upvotes: 1

What is the memory cost of multiple duplicate lambda functions in Python?

Answers (2)

Related Questions