Srwe
Srwe

Reputation: 39

Disassemble Python code to dictionary

I'd like to develop a small debugging tool for Python programs. For the "Dynamic Slicing" feature, I need to find the variables that are accessed in a statement, and find the type of access (read or write) for those variables.

But the only disassembly feature that's built into Python is dis.disassemble, and that just prints the disassembly to standard output:

>>> dis.disassemble(compile('x = a + b', '', 'single'))
  1           0 LOAD_NAME                0 (a)
              3 LOAD_NAME                1 (b)
              6 BINARY_ADD          
              7 STORE_NAME               2 (x)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

I'd like to be able to transform the disassembly into a dictionary of sets describing which variables are used by each instruction, like this:

>>> my_disassemble('x = a + b')
{'LOAD_NAME': set(['a', 'b']), 'STORE_NAME': set(['x'])}

How can I do this?

Upvotes: 2

Views: 839

Answers (2)

Gareth Rees
Gareth Rees

Reputation: 65854

Read the source code for the dis module and you'll see that it's easy to do your own disassembly and generate whatever output format you like. Here's some code that generates the sequence of instructions in a code object, together with their arguments:

from opcode import *

def disassemble(co):
    """
    Disassemble a code object and generate its instructions.
    """
    code = co.co_code
    n = len(code)
    extended_arg = 0
    i = 0
    free = None
    while i < n:
        c = code[i]
        op = ord(c)
        i = i+1
        if op < HAVE_ARGUMENT:
            yield opname[op],
        else:
            oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
            extended_arg = 0
            i = i+2
            if op == EXTENDED_ARG:
                extended_arg = oparg*65536L
            if op in hasconst:
                arg = co.co_consts[oparg]
            elif op in hasname:
                arg = co.co_names[oparg]
            elif op in hasjrel:
                arg = repr(i + oparg)
            elif op in haslocal:
                arg = co.co_varnames[oparg]
            elif op in hascompare:
                arg = cmp_op[oparg]
            elif op in hasfree:
                if free is None:
                    free = co.co_cellvars + co.co_freevars
                arg = free[oparg]
            else:
                arg = oparg
            yield opname[op], arg

And here's an example disassembly.

>>> def f(x):
...     return x + 1
... 
>>> list(disassemble(f.func_code))
[('LOAD_FAST', 'x'), ('LOAD_CONST', 1), ('BINARY_ADD',), ('RETURN_VALUE',)]

You can easily transform this into the dictionary-of-sets data structure you want:

>>> from collections import defaultdict
>>> d = defaultdict(set)
>>> for op in disassemble(f.func_code):
...     if len(op) == 2:
...         d[op[0]].add(op[1])
... 
>>> d
defaultdict(<type 'set'>, {'LOAD_FAST': set(['x']), 'LOAD_CONST': set([1])})

(Or you could generate the dictionary-of-sets data structure directly.)

Note that in your application you probably don't actually need look up the name for each opcode. Instead, you could look up the opcodes you need in the opcode.opmap dictionary and create named constants, perhaps like this:

LOAD_FAST = opmap['LOAD_FAST'] # actual value is 124
...
for var in disassembly[LOAD_FAST]:
    ...

Update: in Python 3.4 you can use the new dis.get_instructions:

>>> def f(x):
...     return x + 1
>>> import dis
>>> list(dis.get_instructions(f))
[Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x',
             argrepr='x', offset=0, starts_line=1, is_jump_target=False),
 Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1,
             argrepr='1', offset=3, starts_line=None, is_jump_target=False),
 Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None,
             argrepr='', offset=6, starts_line=None, is_jump_target=False),
 Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None,
             argrepr='', offset=7, starts_line=None, is_jump_target=False)]

Upvotes: 3

Abhijit
Abhijit

Reputation: 63727

I think the challenge here is to capture the output of a dis rather than parsing the output and create a dictionary. The reason I will not cover the second part is, the format and the fields (key, value) of the dictionary is not mentioned and its trivial.

As I mentioned, the reason its a challenge to capture the OP of dis is, its a print rather than a return, but this can be captured through context manager

def foo(co):
    import sys
    from contextlib import contextmanager
    from cStringIO import StringIO
    @contextmanager
    def captureStdOut(output):
        stdout = sys.stdout
        sys.stdout = output
        yield
        sys.stdout = stdout
    out = StringIO()
    with captureStdOut(out):
        dis.disassemble(co.func_code)
    return out.getvalue()

import dis
import re
dict(re.findall("^.*?([A-Z_]+)\s+(.*)$", line)[0] for line in foo(foo).splitlines() 
                                                  if line.strip())
{'LOAD_CONST': '0 (None)', 'WITH_CLEANUP': '', 'SETUP_WITH': '21 (to 107)', 'STORE_DEREF': '0 (sys)', 'POP_TOP': '', 'LOAD_FAST': '4 (out)', 'MAKE_CLOSURE': '0', 'STORE_FAST': '4 (out)', 'IMPORT_FROM': '4 (StringIO)', 'LOAD_GLOBAL': '5 (dis)', 'END_FINALLY': '', 'RETURN_VALUE': '', 'LOAD_CLOSURE': '0 (sys)', 'BUILD_TUPLE': '1', 'CALL_FUNCTION': '0', 'LOAD_ATTR': '8 (getvalue)', 'IMPORT_NAME': '3 (cStringIO)', 'POP_BLOCK': ''}
>>> 

Upvotes: -1

Related Questions