Nico Schlömer
Nico Schlömer

Reputation: 58791

recover source code from disassembled Python function

I have a Python file with the contents

def fun(x):
    return 2 * x + 5

When importing the file, I get acces to the function object,

from mymodule import fun
print(fun)
<function mymodule.fun(x)>

I can now use dis to disassemble the bytecode and get

import dis
dis.dis(fun)
  2           0 LOAD_CONST               1 (2)
              2 LOAD_FAST                0 (x)
              4 BINARY_MULTIPLY
              6 LOAD_CONST               2 (5)
              8 BINARY_ADD
             10 RETURN_VALUE

From this, I could manually reconstruct the function source above. Is that always possible? How could I do that automatically if the function is more complex?

Upvotes: 1

Views: 565

Answers (1)

rocky
rocky

Reputation: 7098

From Wikipedia

A decompiler is a computer program that translates an executable file to a high-level source file which can be recompiled successfully. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. Decompilers are usually unable to perfectly reconstruct the original source code, thus frequently will produce obfuscated code. Nonetheless, decompilers remain an important tool in the reverse engineering of computer software.

Note:

if you read the rest of the wiki article, its focus is on machine instruction decompilation to languages that compile to machine language. Most implementations of Python are interpreters; and many interpreters do not work this way though. Instead, they often compile to high-level bytecode.

The high-level nature of the instructions you cite above, is reflected by the fact that the program variable names are preserved. This is contrast to the register names and machine locations used in machine code.

That means that an operation like LOAD_CONST, LOAD_FAST, orBINARY_MULTIPLY must work on a number of more complex data types than would be found in a CPU machine register. Take LOAD_CONST for example; its operand can be a list, tuple, dictionary, or set value, or other distinct data types.

I have written about decompilers for interpreters to high-level bytecode here: https://rocky.github.io/Deparsing-Paper.pdf

In Python, bytecode can vary from release to release. For example in Python 3.6 the format of bytecode changed so that a bytecode instruction (opcode plus operand) changed from being either 1 or 3 bytes to a fixed size 2 bytes. In your example above, since the offsets increase by 2 always, you ran this from Python 3.6 or above.

A number of Python decompilers work for a specific version, some work for multiple versions of Python.

The answers to the question in What tools or libraries are there for decompiling python and exploring bytecode? lists a number of Python decompilers.

Upvotes: 1

Related Questions