Reputation: 2692
CPython 3.7 introduced the ability to step through individual opcodes in a debugger. However, I can't figure out how to read variables out of the bytecode stack.
For example, when debugging
def f(a, b, c):
return a * b + c
f(2, 3, 4)
I want to find out that the inputs of the addition are 6 and 4. Note how 6 never touches locals()
.
So far I could only come up with the opcode information, but I don't know how to get the opcode inputs:
import dis
import sys
def tracefunc(frame, event, arg):
frame.f_trace_opcodes = True
print(event, frame.f_lineno, frame.f_lasti, frame, arg)
if event == "call":
dis.dis(frame.f_code)
elif event == "opcode":
instr = next(
i for i in iter(dis.Bytecode(frame.f_code))
if i.offset == frame.f_lasti
)
print(instr)
print("-----------")
return tracefunc
def f(a, b, c):
return a * b + c
sys.settrace(tracefunc)
f(2, 3, 4)
Output:
call 19 -1 <frame at 0x7f97df618648, file 'test_trace.py', line 19, code f> None
20 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_MULTIPLY
6 LOAD_FAST 2 (c)
8 BINARY_ADD
10 RETURN_VALUE
-----------
line 20 0 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
-----------
opcode 20 0 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='a', argrepr='a', offset=0, starts_line=20, is_jump_target=False)
-----------
opcode 20 2 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
Instruction(opname='LOAD_FAST', opcode=124, arg=1, argval='b', argrepr='b', offset=2, starts_line=None, is_jump_target=False)
-----------
opcode 20 4 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
Instruction(opname='BINARY_MULTIPLY', opcode=20, arg=None, argval=None, argrepr='', offset=4, starts_line=None, is_jump_target=False)
-----------
opcode 20 6 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
Instruction(opname='LOAD_FAST', opcode=124, arg=2, argval='c', argrepr='c', offset=6, starts_line=None, is_jump_target=False)
-----------
opcode 20 8 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=8, starts_line=None, is_jump_target=False)
-----------
opcode 20 10 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> None
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=10, starts_line=None, is_jump_target=False)
-----------
return 20 10 <frame at 0x7f97df618648, file 'test_trace.py', line 20, code f> 10
-----------
Upvotes: 9
Views: 1186
Reputation: 6012
You can inspect CPython's inter-opcode state using a C-extension, gdb, or using dirty tricks (examples below).
CPython's bytecode is run by a stack machine. That means that all state between opcodes is kept in a stack of PyObject*
s.
Let's take a quick look at CPython's frame object:
typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* code segment */
... // More fields
PyObject **f_stacktop;
... // More fields
} PyFrameObject;
See the PyObject **f_stacktop
right near the end? This is a pointer to the top of this stack. Most (if not all?) CPython's opcodes use that stack to get parameters and store results.
For example, let's take a look at the implementation for BINARY_ADD
(addition with two operands):
case TARGET(BINARY_ADD): {
PyObject *right = POP();
PyObject *left = TOP();
... // sum = right + left
SET_TOP(sum);
...
}
It pops two values from the stack, add them up and puts the result back in the stack.
As we saw above, CPython's frame objects are native - PyFrameObject
is a struct, and frameobject.c
defines the pythonic interface allowing to read (and sometimes write) some of its members.
Specifically, the member f_stacktop
is not exposed in python, so to access this member and read the stack you'll have to write some code in C or use GDB.
Specifically, if you're writing a debugging-utils library, I'd recommend writing a C extension, which will allow you to write some basic primitives in C (like getting the current stack as a list of python objects), and the rest of the logic in python.
If it's a one time thing, you could probably try playing around with GDB and inspect the stack.
The plan: find the address of the stack and read the numbers stored in it from memory - in python!
First, we need to be able to find the offset of f_stacktop
in the frame object.
I installed a debugging version of python (on my ubuntu it's apt install python3.7-dbg
). This package includes a python binary that contains debugging symbols (some information about the program made to help debuggers).
dwarfdump
is a utility that can read and display debugging symbols (DWARF is a common debugging-info format used mostly in ELF binaries).
Running dwarfdump -S any=f_stacktop -Wc /usr/bin/python3.7-dbg
provides us with the following output:
DW_TAG_member
DW_AT_name f_stacktop
DW_AT_decl_file 0x00000034 ./build-debug/../Include/frameobject.h
DW_AT_decl_line 0x0000001c
DW_AT_decl_column 0x00000010
DW_AT_type <0x00001969>
DW_AT_data_member_location 88
DW_AT_data_member_location
sounds like the offset of f_stacktop
!
Now let's write some code:
#!/usr/bin/python3.7-dbg
from ctypes import sizeof, POINTER, py_object
# opname is a list of opcode names, where the indexes are the opcode numbers
from opcode import opname
import sys
# The offset we found using dwarfdump
F_STACKTOP = 88
def get_stack(frame):
# Getting the address of the stack by adding
# the address of the frame and the offset of the member
f_stacktop_addr = id(frame) + F_STACKTOP
# Initializing a PyObject** directly from memory using ctypes
return POINTER(py_object).from_address(f_stacktop_addr)
def tracefunc(frame, event, arg):
frame.f_trace_opcodes = True
if event == 'opcode':
# frame.f_code.co_code is the raw bytecode
opcode = frame.f_code.co_code[frame.f_lasti]
if opname[opcode] == 'BINARY_ADD':
stack = get_stack(frame)
# According to the implementation of BINARY_ADD,
# the last two items in the stack should be the addition operands
print(f'{stack[-2]} + {stack[-1]}')
return tracefunc
def f(a, b, c):
return a * b + c
sys.settrace(tracefunc)
f(2, 3, 4)
The ouput: 6 + 4
! Great success! (said with satisfied Borat voice)
This code is not portable yet, because F_STACKTOP
will vary between python binaries. To fix that you could use ctypes.Structure
to create a frame object structure and easily get the value of the f_stacktop
member in a more portable fashion.
Note that doing that will hopefully make your code platform-independent, but it will not make it python-implementation-independent. Code like that might only work with the CPython version you wrote it with originally. This is because to create a ctypes.Structure
subclass, you will have to rely on CPython's implementation of frame objects (or more specifically, on PyFrameObject
's members' types and order).
Upvotes: 9