Reputation: 2149
Reading "97 things every programmer should know" I found interesting essay about code analysis tools.
Author claims that disassembler from Python standard library can be very useful to debug your every day code
Here it goes: "One thing this library(Python standard library disassembler) can disassemble is your last stack trace, giving you feedback on exactly which bytecode instruction threw the last uncaught exception."
But there is no explanation of this in the book
So does anybody has idea how above module could be useful for debugging ?
Upvotes: 2
Views: 299
Reputation: 7098
While a disassembler can be a tool to help you understand how Python understands what you write, it is not the only tool. there are other tools that can help as well. And as we will see some of them can work together.
So here's a little piece of Python:
def five():
return 5
print(five())
And here part of a disassembly of it using the cross platform disassembler I wrote which is called xdis:
# Python bytecode 3.4 (3310)
# Disassembled from Python 3.4.2 (default, May 17 2015, 22:17:04)
# [GCC 4.8.2]
# Timestamp in code: 1499405520 (2017-07-07 01:32:00)
# Source code size mod 2**32: 39 bytes
# Method Name: <module>
# Filename: five.py
# Argument count: 0
# Kw-only arguments: 0
# Number of locals: 0
# Stack size: 2
# Flags: 0x00000040 (NOFREE)
# First Line: 1
# Constants:
# 0: <code object five at 0x7f99dd4e88a0, file "five.py", line 1>
# 1: 'five'
# 2: None
# Names:
# 0: five
# 1: print
1:
LOAD_CONST 0 (<code object five at 0x7f99dd4e88a0, file "five.py", line 1>)
LOAD_CONST 1 ('five')
MAKE_FUNCTION 0 (0 positional, 0 name and default, 0 annotations)
STORE_NAME 0 (five)
3:
LOAD_NAME 1 (print)
LOAD_NAME 0 (five)
CALL_FUNCTION 0 (0 positional, 0 keyword pair)
CALL_FUNCTION 1 (1 positional, 0 keyword pair)
POP_TOP
LOAD_CONST 2 (None)
RETURN_VALUE
...
(This is Python 3.4, other versions vary the details a little.)
The first thing to note is that python thinks this code comes from a file with a path name five.py
. If you've happened to rename the file but not the python code, this can confuse Python. Or the filename could be tmp/five.py
instead and then you should look for that instead. Also, in Python versions 3 and up there is the size of the file (modulo 2**32) as a check to see if the five.py
on the filesystem is the same as the one Python saw when it compiled the file.
I draw your attention to the beginning of the code: we are loading a constant object which happens to be code for a function! And then then name of the function and finally calling MAKE_FUNCTION
and storing that in a variable called five.
The thing that is a bit unusual if you are used to a compiled language like C++, Go, or Java which doesn't do this, is that the function is created right there on the spot when you run the program. If my program had another instruction before and were instead:
x = five() # five is hasn't been defined here!
def five(): ...
This would fail because that the MAKE_FUNCTION hasn't been run and so at the beginning five hasn't been defined yet.
Now I'll also suggest that you might be able to learn this using a debugger as well and I suggest again trepan2 or trepan3 which has a disassembly command built into them and even a deparser for that assembly.
Another place where disassembly can be elucidating is in the rare cases where Python does optimization on the code.
Consider this Python source code:
if 1:
y = 5
Here, in Python versions after around 2.3 will simply notice that if 1:
is superfluous and remove that code. But if you had say instead:
x = 1
if x:
y = 5
That is enough to confound Python to keep the test in. Disassembly is the only way I think you can know this.
The last aspect, is understanding exactly where you are when you stopped in a debugger or hit an error. You often (but not always) get the line you had the error, but sometimes that can be confusing. Normal Python masks information that is useful here, the instruction offset, but I'll show you how to get that and the instruction where you had the error.
Suppose my code is:
prev = [100] + range(3)
x = prev[prev[prev[0]]]
If I run this I will get an IndexError exception. But which "prev" was it?
trepan2 (or trepan3k) exposes the instruction pointer here. It also gives access to both a disassembler and a deparser. So let's see how that can be used here:
trepan2 /tmp/boom.py
-> 2 prev = [100] + range(3)
(trepan2) next
(/tmp/boom.py:3 @19): <module>
-- 3 x = prev[prev[prev[0]]]
(trepan2) next
(/tmp/boom.py:3 @32): <module>
!! 3 x = prev[prev[prev[0]]]
R=> (<type 'exceptions.IndexError'>, 'list index out of range', <traceback object at
(trepan2) info pc
PC offset is 32.
2 0 LOAD_CONST 0 100
3 BUILD_LIST 1
6 LOAD_NAME 0 0
9 LOAD_CONST 1 3
12 CALL_FUNCTION 1 1 positional, 0 keyword pair
15 BINARY_ADD None
16 STORE_NAME 1 1
3 19 LOAD_NAME 1 1
22 LOAD_NAME 1 1
25 LOAD_NAME 1 1
28 LOAD_CONST 2 0
31 BINARY_SUBSCR None
--> 32 BINARY_SUBSCR None
33 BINARY_SUBSCR None
34 STORE_NAME 2 2
37 LOAD_CONST 3 None
40 RETURN_VALUE None
Ok. So we see where exactly we were, offset 32 (@32 after previously stopping at offset @19), but what does this mean? The trepan debuggers will convert this back into Python so you don't have to do that yourself:
(trepan2) deparse -p
instruction: 32 BINARY_SUBSCR
x = prev[prev[prev[0]]]
-------------
Contained in...
Grammar Symbol: binary_subscr
x = prev[prev[prev[0]]]
-------------------
(trepan2) prev
[100, 0, 1, 2]
The above, then, shows you were at offset 32 (not 31 or 33) and that particular prev access wasn't the first access prev[0]
but the one after that prev[prev[0]]
.
Although having both a disasembler, deparser inside the debugger, makes it so you don't have to know that much about what's going on. But I don't think it hurts to know what the instructions do or what the sequence of instructions is.
Upvotes: 3