rocky
rocky

Reputation: 7098

How does one disassemble Python graal bytecode?

I have been considering extending the cross-version python disassembler xdis for Python Graal.

GraalPython provides a Python Code type that is similar to Python's Code type, but the underlying bytecode bytes co_code is different. In Python, these are bytecode-encoded Python bytecode instructions . In Graal, I am given to understand that this contains JVM bytecode, but there seems to be more than just instructions.

Recall that bytecode operands typically are indexes into some other table like a constants pool or a variable-name list. Even though Graal's code type has this information stored in the other parts of the code type in the way that Python does it, I suspect there are additional tables in the co_code byte array.

To give some idea of what is in the co_code bytearray, here its value consider this file

def five():
    return 5

Using GraalVM Python 3.8.5 (GraalVM CE Native 22.2.0), a hexdump of python -m compileall /tmp/five.py gives:

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
-------------------------------------------------
00000000: 9e52 0d0a 0000 0000 dd5b 6867 1900 0000  .R.......[hg....
00000010: c30c 0000 002f 746d 702f 6669 7665 2e70  ...../tmp/five.p
00000020: 7940 0000 009a 0000 000f 0007 6669 7665  [email protected]
00000030: 2e70 7900 0c2f 746d 702f 6669 7665 2e70  .py../tmp/five.p
00000040: 7900 0000 1964 6566 2066 6976 6528 293a  y....def five():
00000050: 0a20 2020 2072 6574 7572 6e20 350a 0000  .    return 5...
00000060: 025b 5d72 2fc8 0a00 0000 0000 0000 0000  .[]r/...........
00000070: 0000 0000 0000 0000 0000 0000 0101 0004  ................
00000080: 6669 7665 724b cb05 0100 0000 0000 0000 fiverK..........
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0702 1901 1501 724b cb05 0004 6669  ........rK....fi
000000b0: 7665 02ff ffff 0000 3007 1208 0120 011c  ve......0.... ..
000000c0: 1901 0501 0000 0000 0000 00 

The above hexdump contains module information, the main code, and it looks like embedded source text. The bytecode for function five() might be around 0x80.

Changing the return value from 5 to 6 changes:

00000080: 6669 7665 724b cb05 0100 0000 0000 0000  fiverK..........

to:

00000080: 6669 7665 36c7 9bee 0100 0000 0000 0000  five6...........

In sum, how does one decipher this? Are there tools that can be used for doing so?

I had a bounty added to this which has expired with no potential answers. Should someone sufficiently answer this in the future and want a bounty for it, let me know after the answer is accepted.

Edit note: I have had problems with getting a hex dump that seems okay. Best to create your own using compileall and use your own hex dump routine.

Upvotes: 1

Views: 176

Answers (0)

Related Questions