Jasper Bart
Jasper Bart

Reputation: 103

What are all that zeros in python bytecode and how to compute them

When I do list(some_function.__code__.co_code) I can see the actual bytecode of that function (in list[int] format). And I see that there are a lot of zeros - actually more than in earlier versions of python. Yeah, I've seen this question, but if I create that function in python 3.12, there are more zeros in the bytecode. My question is: "what do all that zeros mean and if I want to write bytecode, how can I compute how many zeros are needed?"

See what happens:

def f(x):
    return x + x/3

bytecode = list(f.__code__.co_code)
print(bytecode)

prints:

[151, 0, 124, 0, 100, 1, 124, 0, 122, 11, 0, 0, 122, 0, 0, 0, 83, 0]

def f(x):
    return x + x/3

dis.dis(f, show_caches=True)

gives:

  1           0 RESUME                   0
  2           2 LOAD_FAST                0 (x)
              4 LOAD_FAST                0 (x)
              6 LOAD_CONST               1 (3)
              8 BINARY_OP               11 (/)
             10 CACHE                    0 (counter: 0)
             12 BINARY_OP                0 (+)
             14 CACHE                    0 (counter: 0)
             16 RETURN_VALUE

This is different of the code in the already mentioned code at a couple of points:

What's going on here? And why use all opcodes one argument (also the one which opcode is less than dis.HAVE_ARGUMENT)

Where this is not super strange it gets somewhat stranger when dealing with the following function:

def f():
    print("hello world!")

bytecode: [151, 0, 116, 1, 0, 0, 0, 0, 0, 0, 0, 0, 100, 1, 171, 1, 0, 0, 0, 0, 0, 0, 1, 0, 121, 0]

Could someone also explain all these zeros?

Thanks in advance!

EDIT

I see that all that zeros are CACHE opcodes, but how to compute how many CACHEs are needed?

EDIT

There is a suggestion that the 0s in the bytecode are arguments not CACHE, but that assertion appears incorrect.

Looking at an annotated output of:

def f():
    print("hello world!")

print(list(f.__code__.co_code))

for instr in dis.Bytecode(f):
  print(instr.opname, instr.opcode, instr.arg)

One can see:

[151, 0, 116, 1, 0, 0, 0, 0, 0, 0, 0, 0, 100, 1, 171, 1, 0, 0, 0, 0, 0, 0, 1, 0, 121, 0]
   |  |    |  |                            |  |    |  |                    |  |    |  |
   |  |    |  |                            |  |    |  |                    |  |    |  |
   |  |    |  |                            |  |    |  |                    |  |    |  |
   |  |    |  |                            |  |    |  |                    |  |    |  |
   |--|    |--|---------|                  |  |    |  |                    |  |    |  |
   |--|---------------| |                  |  |    |  |                    |  |    |  |
                      | |                  |  |    |  |                    |  |    |  |
RESUME 151 0        --| |                  |  |    |  |                    |  |    |  |
LOAD_GLOBAL 116 1   ----|                  |  |    |  |                    |  |    |  |
LOAD_CONST 100 1    -----------------------|--|    |  |                    |  |    |  |
CALL 171 1          -------------------------------|--|                    |  |    |  |
POP_TOP 1 None      -------------------------------------------------------|--|    |  |
RETURN_CONST 121 0  ---------------------------------------------------------------|--|

Many of these 0 values not pointed to by the loop are indicated to be "CACHE" by dis.dis(f, show_caches=True)

Upvotes: 0

Views: 153

Answers (1)

rocky
rocky

Reputation: 7098

The way you are asking the question feels strange to me, because I think what you want to know is how to write bytecode instructions and then convert it to bytecode bytes.

For writing bytecode and having it get converted to a bytecode file, see xasm. This is on PyPI but I haven't put out a new release this in quite a while. So build from source on github.

For understanding how bytecode works interactively, see x-python. Same story with respect to being on PyPI and a new release. And for x-python, there is even a debugger that will let you single step instructions. That is called trepan-xpy. This might be the hardest to install because has a lot of dependencies, such as the trepan debugger.

I will be talking at BlackHat Asia mid April 2024, so there may be new releases sometime a little before that.

Before I attempt to answer your question as stated let me give an analogous situation. Suppose I am trying to learn numbers. And I see there are all these different flavors of base system (which is analogous to Python bytecode versions). Someone asks why are there more zeros the binary version of the numbers than in the base 10 version of then numbers? Well that's because there are fewer digits to pick so each digit appears more often.

In bytecode, you'll see a lot of zeros because zero is the smallest integer and a number of operands of instructions are indexes into table of some sort such as a tuple of constants, tuple of variable names. If the tuple is not empty, then it will have a 0 item. And since the tuple are by some need for it the 0th item will probably have one instruction with an operand of 0.

As was mentioned before 0, is also used in bytecode as a placeholder when there is no value. So instructions that don't have an operand typically put 0 in there. I don't know if this is strictly required. Someone might check if some other value will work, and the Python interpreter will ignore that as good as it ignores the operand with value 0.

Upvotes: 1

Related Questions