Do Python interpreter resolve variable references when a function is defined but not called?

Question

First of all, this post does NOT answer my question or give me any guide to answer my question at all.

My question is about mechanism function resolving non-local variables.

Code

# code block 1
def func():
    vals = [0, 0, 0]
    other_vals = [7, 8, 9]
    other = 12

    def func1():
        vals[1] += 1
        print(vals)

    def func2():
        vals[2] += 2
        print vals

    return (func1, func2)

f1, f2 = func()

Try to run f1, f2:

>>> f1()
[0, 1, 0]
>>> f2
[0, 1, 2]

This shows that the object previously referred by vals are shared by f1 and f2, and not garbage collected after execution of func.

Will objects referred by other_vals and other be garbage collected? I think so. But how does Python decide not to garbage collect vals?

Assumption 1

Python interpreter will resolve variable names within func1 and func2 to figure out references inside the function, and increase the reference count of [0, 0, 0] by 1 preventing it from garbage collection after the func call.

But if I do

# code block 2
def outerfunc():
    def innerfunc():
        print(non_existent_variable)
f = outerfunc()

No error reported. Further more

# code block 3
def my_func():
    print(yet_to_define)
yet_to_define = "hello"

works.

Assumption 2 Variable names are resolved dynamically at run time. This makes observations in code block 2 and 3 easy to explain, but how did the interpreter know it need to increase reference count of [0, 0, 0] in code block 1?

Which assumption is correct?

PM 2Ring · Accepted Answer

Your first example creates a closure; also see Why aren't python nested functions called closures?, Can you explain closures (as they relate to Python)?, and What exactly is contained within a obj.__closure__?.

The closure mechanism ensures that the interpreter stores a reference to vals in the returned function objects func1 and func2. Your Assumption 1 is correct: that reference prevents vals from being garbage collected when func returns.

In your second example, the interpreter cannot see a reference to non_existent_variable in the enclosing scope(s), but that's ok because your Assumption 2 is also correct, so you're free to use names that haven't yet been bound to objects at function declaration time, so long as the name is in scope when you actually call the function.

The answer to "how did the interpreter know it need to increase reference count of [0, 0, 0] in code block 1?" is that the closure mechanism is an explicit thing the interpreter does when it executes a function definition, i.e., when it's creating a function object from the function definition in your script.

Every Python function object (both normal def-style functions and lambdas) has an attribute to store this closure information, with a minor difference between Python 2 and Python 3. See the links at the start of this answer for details, but I will mention here that Python 3 provides the nonlocal keyword, which works a bit like the global keyword: nonlocal allows you to make assignments to closed-over simple variables; J.F. Sebastian's answer has a simple example illustrating the use of nonlocal.

Note that with nested functions the inner function definitions are processed each time you call the outer function, which allows you to do things like:

def func(vals):
    def func1():
        vals[1] += 1
        print(vals)

    def func2():
        vals[2] += 2
        print(vals)

    return func1, func2

f1, f2 = func([0, 0, 0])
f1()
f2()

f1, f2 = func([10, 20, 30])
f1()
f2()

output

[0, 1, 0]
[0, 1, 2]
[10, 21, 30]
[10, 21, 32]

Do Python interpreter resolve variable references when a function is defined but not called?

Answers (1)

Related Questions