Strange behaviour when creating a function directly from CodeType and FunctionType in python

Question

DISCLAIMER: I know that what I'm doing should probably never ever be done in a real program.

I recently just learnt about python's types.CodeType and types.FunctionType and it got me interested in creating functions manually via these classes. So as a small test, I started with a function that looked something like this:

def x(e, a, b=0, *c, d=0, **f):
    print(a, b, c, d, e, f)

and wanted to see if I could move the parameters around to turn it into this:

def x(a, b=0, *c, d=0, e=0, **f):
    print(a, b, c, d, e, f)

Essentially I want to make an ordinary parameter into a keyword-only argument. This is the code I used to do such a mutation:

from types import CodeType, FunctionType

def x(e, a, b=0, *c, d=0, **f):
    print(a, b, c, d, e, f)

code = x.__code__
codeobj = CodeType(
    code.co_argcount - 1, code.co_kwonlyargcount + 1, code.co_nlocals, code.co_stacksize,
    code.co_flags, code.co_code, code.co_consts, code.co_names, ('a', 'b', 'd', 'e', 'c', 'f'),
    code.co_filename, code.co_name, code.co_firstlineno, code.co_lnotab, code.co_freevars,
    code.co_cellvars
    )
new_func = FunctionType(codeobj, x.__globals__, x.__name__, x.__defaults__, x.__closure__)
new_func.__kwdefaults__ = {'d': 0, 'e': 0}

Strangely enough the tooltip seems to show up correctly (the little yellow rectangle of text that shows up in the IDLE interpretter when you begin typing a function call), it displays "a, b=0, *c, d=0, e=0, **f". But the behaviour the function had was interesting to say the least:

>>> new_func(1)
0 0 () 0 1 {}
>>> new_func(1, 2)
2 0 () 0 1 {}

The first parameter was still getting sent in as e, and the second element is still getting sent in as a.

Is there a way to fix this? If there is, would it require delving into the code.co_code and breaking apart the opcodes, or is there a simpler method?

Martijn Pieters · Accepted Answer

Functions and their code objects are tightly coupled. Arguments are handed in as locals, and locals are looked up by index:

>>> import dis
>>> def x(e, a, b=0, *c, d=0, **f):
...     print(a, b, c, d, e, f)
... 
>>> dis.dis(x)
  2           0 LOAD_GLOBAL              0 (print)
              3 LOAD_FAST                1 (a)
              6 LOAD_FAST                2 (b)
              9 LOAD_FAST                4 (c)
             12 LOAD_FAST                3 (d)
             15 LOAD_FAST                0 (e)
             18 LOAD_FAST                5 (f)
             21 CALL_FUNCTION            6 (6 positional, 0 keyword pair)
             24 POP_TOP
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE

Note the integers after the LOAD_FAST byte codes, those are indices into the locals array. Reshuffling your arguments did not alter those bytecode indices.

The code.co_varnames list is only used for introspection (such as the dis output), to map indices back to names, not the other way around.

You'd have to apply surgery to the bytecode to alter this; see the dis module for more details.

If you are using Python 3.4 or newer, you can make use of the new dis.get_instructions() function to iterate over an information-rich sequence of Instruction objects, which should make such surgery doable. Look for the LOAD_FAST instructions, and map indices as you produce new bytecode.

Instruction objects don't have a method (yet) to convert them back to bytes; adding one is trivial:

from dis import Instruction, HAVE_ARGUMENT

def to_bytes(self):
    res = bytes([self.opcode])
    if self.opcode >= HAVE_ARGUMENT:
        res += (self.arg or 0).to_bytes(2, byteorder='little')
    return res

Instruction.to_bytes = to_bytes

Demo:

>>> [ins.to_bytes() for ins in dis.get_instructions(code)]
[b't\x00\x00', b'|\x01\x00', b'|\x02\x00', b'|\x04\x00', b'|\x03\x00', b'|\x00\x00', b'|\x05\x00', b'\x83\x06\x00', b'\x01', b'd\x00\x00', b'S']
>>> b''.join([ins.to_bytes() for ins in dis.get_instructions(code)]) == code.co_code
True

Now all you have to do is map the .arg argument of instructions with .opname == 'LOAD_FAST' to a new index.

Strange behaviour when creating a function directly from CodeType and FunctionType in python

Answers (1)

Related Questions