jamadagni
jamadagni

Reputation: 1264

In Python, are function-level assignments evaluated at each function call?

Please consider the following code:

import re

def qcharToUnicode(s):
    p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")
    return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)

def fixSurrogatePresence(s) :
    '''Returns the input UTF-16 string with surrogate pairs replaced by the character they represent'''
    # ideas from:
    # http://www.unicode.org/faq/utf_bom.html#utf16-4
    # http://stackoverflow.com/a/6928284/1503120
    def joinSurrogates(match) :
        SURROGATE_OFFSET = 0x10000 - ( 0xD800 << 10 ) - 0xDC00
        return chr ( ( ord(match.group(1)) << 10 ) + ord(match.group(2)) + SURROGATE_OFFSET )
    return re.sub ( '([\uD800-\uDBFF])([\uDC00-\uDFFF])', joinSurrogates, s )

Now my questions below probably reflect a C/C++ way of thinking (and not a "Pythonic" one) but I'm curious nevertheless:

I'd like to know whether the evaluation of the compiled RE object p in qcharToUnicode and SURROGATE_OFFSET in joinSurrogates will take place at each call to the respective functions or only once at the point of definition? I mean in C/C++ one can declare the values as static const and the compile will (IIUC) make the construction occur only once, but in Python we do not have any such declarations.

The question is more pertinent in the case of the compiled RE object, since it seems that the only reason to construct such an object is to avoid the repeated compilation, as the Python RE HOWTO says:

Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re accessing a regex within a loop, pre-compiling it will save a few function calls.

... and this purpose would be defeated if the compilation were to occur at each function call. I don't want to put the symbol p (or SURROGATE_OFFSET) at module level since I want to restrict its visibility to the relevant function only.

So does the interpreter do something like heuristically determine that the value pointed to by a particular symbol is constant (and visible within a particular function only) and hence need not be reconstructed at next function? Further, is this defined by the language or implementation-dependent? (I hope I'm not asking too much!)

A related question would be about the construction of the function object lambda m in qcharToUnicode -- is it also defined only once like other named function objects declared by def?

Upvotes: 0

Views: 135

Answers (3)

chthonicdaemon
chthonicdaemon

Reputation: 19760

The simple answer is that as written, the code will be executed repeatedly at every function call. There is no implicit caching mechanism in Python for the case you describe.

You should get out of the habit of talking about "declarations". A function definition is in fact also "just" a normal statement, so I can write a loop which defines the same function repeatedly:

for i in range(10):
    def f(x):
        return x*2
    y = f(i)

Here, we will incur the cost of creating the function at every loop run. Timing reveals that this code runs in about 75% of the time of the previous code:

def f(x):
    return x*2

for i in range(10):
    y = f(i)

The standard way of optimising the RE case is as you already know to place the p variable in the module scope, i.e.:

p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")

def qcharToUnicode(s):
    return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)

You can use conventions like prepending "_" to the variable to indicate it is not supposed to be used, but normally people won't use it if you haven't documented it. A trick to make the RE function-local is to use a consequence about default parameters: they are executed at the same time as the function definition, so you can do this:

def qcharToUnicode(s, p=re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")):
    return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)

This will allow you the same optimisation but also a little more flexibility in your matching function.

Thinking properly about function definitions also allows you to stop thinking about lambda as different from def. The only difference is that def also binds the function object to a name - the underlying object created is the same.

Upvotes: 3

stewSquared
stewSquared

Reputation: 877

Yes, they are. Suppose re.compile() had a side-effect. That side effect would happen everytime the assignment to p was made, ie., every time the function containing said assignment was called.

This can be verified:

def foo():
    print("ahahaha!")
    return bar

def f():
    return foo()
def funcWithSideEffect():
    print("The airspeed velocity of an unladen swallow (european) is...")
    return 25

def funcEnclosingAssignment():
    p = funcWithSideEffect()
    return p;

a = funcEnclosingAssignment()
b = funcEnclosingAssignment()
c = funcEnclosingAssignment()

Each time the enclosing function (analogous to your qcharToUnicode) is called, the statement is printed, revealing that p is being re-evaluated.

Upvotes: 1

nmenezes
nmenezes

Reputation: 930

Python is a script/interpreted language... so yes, the assignment will be made every time you call the function. The interpreter will parse your code only once, generating Python bytecode. The next time you call this function, it will be already compiled into Python VM bytecode, so the function will be simply executed.

The re.compile will be called every time, as it would be in other languages. If you want to mimic a static initialization, consider using a global variable, this way it will be called only once. Better, you can create a class with static methods and static members (class and not instance members).

You can check all this using the dis module in Python. So, I just copied and pasted your code in a teste.py module.

>>> import teste
>>> import dis
>>> dis.dis(teste.qcharToUnicode)
  4           0 LOAD_GLOBAL              0 (re)
              3 LOAD_ATTR                1 (compile)
              6 LOAD_CONST               1 ('QChar\\((0x[a-fA-F0-9]*)\\)')
              9 CALL_FUNCTION            1
             12 STORE_FAST               1 (p)

  5          15 LOAD_FAST                1 (p)
             18 LOAD_ATTR                2 (sub)
             21 LOAD_CONST               2 (<code object <lambda> at 0056C140, file "teste.py", line 5>)
             24 MAKE_FUNCTION            0
             27 LOAD_FAST                0 (s)
             30 CALL_FUNCTION            2
             33 RETURN_VALUE

Upvotes: 1

Related Questions