Reputation: 39
I'd like to develop a small debugging tool for Python programs. For the "Dynamic Slicing" feature, I need to find the variables that are accessed in a statement, and find the type of access (read or write) for those variables.
But the only disassembly feature that's built into Python is dis.disassemble
, and that just prints the disassembly to standard output:
>>> dis.disassemble(compile('x = a + b', '', 'single'))
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 BINARY_ADD
7 STORE_NAME 2 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
I'd like to be able to transform the disassembly into a dictionary of sets describing which variables are used by each instruction, like this:
>>> my_disassemble('x = a + b')
{'LOAD_NAME': set(['a', 'b']), 'STORE_NAME': set(['x'])}
How can I do this?
Upvotes: 2
Views: 839
Reputation: 65854
Read the source code for the dis
module and you'll see that it's easy to do your own disassembly and generate whatever output format you like. Here's some code that generates the sequence of instructions in a code object, together with their arguments:
from opcode import *
def disassemble(co):
"""
Disassemble a code object and generate its instructions.
"""
code = co.co_code
n = len(code)
extended_arg = 0
i = 0
free = None
while i < n:
c = code[i]
op = ord(c)
i = i+1
if op < HAVE_ARGUMENT:
yield opname[op],
else:
oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
extended_arg = 0
i = i+2
if op == EXTENDED_ARG:
extended_arg = oparg*65536L
if op in hasconst:
arg = co.co_consts[oparg]
elif op in hasname:
arg = co.co_names[oparg]
elif op in hasjrel:
arg = repr(i + oparg)
elif op in haslocal:
arg = co.co_varnames[oparg]
elif op in hascompare:
arg = cmp_op[oparg]
elif op in hasfree:
if free is None:
free = co.co_cellvars + co.co_freevars
arg = free[oparg]
else:
arg = oparg
yield opname[op], arg
And here's an example disassembly.
>>> def f(x):
... return x + 1
...
>>> list(disassemble(f.func_code))
[('LOAD_FAST', 'x'), ('LOAD_CONST', 1), ('BINARY_ADD',), ('RETURN_VALUE',)]
You can easily transform this into the dictionary-of-sets data structure you want:
>>> from collections import defaultdict
>>> d = defaultdict(set)
>>> for op in disassemble(f.func_code):
... if len(op) == 2:
... d[op[0]].add(op[1])
...
>>> d
defaultdict(<type 'set'>, {'LOAD_FAST': set(['x']), 'LOAD_CONST': set([1])})
(Or you could generate the dictionary-of-sets data structure directly.)
Note that in your application you probably don't actually need look up the name for each opcode. Instead, you could look up the opcodes you need in the opcode.opmap
dictionary and create named constants, perhaps like this:
LOAD_FAST = opmap['LOAD_FAST'] # actual value is 124
...
for var in disassembly[LOAD_FAST]:
...
Update: in Python 3.4 you can use the new dis.get_instructions
:
>>> def f(x):
... return x + 1
>>> import dis
>>> list(dis.get_instructions(f))
[Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x',
argrepr='x', offset=0, starts_line=1, is_jump_target=False),
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1,
argrepr='1', offset=3, starts_line=None, is_jump_target=False),
Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None,
argrepr='', offset=6, starts_line=None, is_jump_target=False),
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None,
argrepr='', offset=7, starts_line=None, is_jump_target=False)]
Upvotes: 3
Reputation: 63727
I think the challenge here is to capture the output of a dis
rather than parsing the output and create a dictionary. The reason I will not cover the second part is, the format and the fields (key, value) of the dictionary is not mentioned and its trivial.
As I mentioned, the reason its a challenge to capture the OP of dis
is, its a print rather than a return, but this can be captured through context manager
def foo(co):
import sys
from contextlib import contextmanager
from cStringIO import StringIO
@contextmanager
def captureStdOut(output):
stdout = sys.stdout
sys.stdout = output
yield
sys.stdout = stdout
out = StringIO()
with captureStdOut(out):
dis.disassemble(co.func_code)
return out.getvalue()
import dis
import re
dict(re.findall("^.*?([A-Z_]+)\s+(.*)$", line)[0] for line in foo(foo).splitlines()
if line.strip())
{'LOAD_CONST': '0 (None)', 'WITH_CLEANUP': '', 'SETUP_WITH': '21 (to 107)', 'STORE_DEREF': '0 (sys)', 'POP_TOP': '', 'LOAD_FAST': '4 (out)', 'MAKE_CLOSURE': '0', 'STORE_FAST': '4 (out)', 'IMPORT_FROM': '4 (StringIO)', 'LOAD_GLOBAL': '5 (dis)', 'END_FINALLY': '', 'RETURN_VALUE': '', 'LOAD_CLOSURE': '0 (sys)', 'BUILD_TUPLE': '1', 'CALL_FUNCTION': '0', 'LOAD_ATTR': '8 (getvalue)', 'IMPORT_NAME': '3 (cStringIO)', 'POP_BLOCK': ''}
>>>
Upvotes: -1