Valentin Lorentz
Valentin Lorentz

Reputation: 9753

Python 3 bytecode format

I want to read a .pyc file. However, I cannot find any documentation on the format.

The only one I found does not work for Python 3 (although it does for Python 2):

>>> f = open('__pycache__/foo.cpython-34.pyc', 'rb')
>>> f.read(4)
b'\xee\x0c\r\n'
>>> f.read(4)
b'\xf8\x17\x08W'
>>> marshal.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: bad marshal data (unknown type code)

marshal only consumes one byte: \x00, which indeed is not a valid first character for marshall (as a comparison, the first byte of Python 2 bytecode for the same empty module is c)

So, how can I decode what comes after the header?

Upvotes: 1

Views: 1340

Answers (2)

aghast
aghast

Reputation: 15310

Try this. It worked a while back. They added another int32 in v3.

def load_file(self, source):
    if isinstance(source, str):
        import os.path
        if not os.path.exists(source):
            raise IOError("Cannot load_file('"
                + source
                + "'): does not exist")
        with open(source, "rb") as fh:
            header_bytes = fh.read(12)
            # ignore header
            self.code = marshal.load(fh)

        return self.code

Upvotes: 4

Robert Jacobs
Robert Jacobs

Reputation: 3360

Have you looked at the dissembler? https://docs.python.org/3/library/dis.html

Upvotes: 0

Related Questions