Compare bytes in Python 2 from file with hex literal?

Question

I'm trying to write a script to see if a given file has a Java classfile header, i.e. the first 4 bytes of the file are 0xCAFEBABE.

However I'm not quite sure how to perform the equality checks.

Here's my current scratch code:

class JavaClassParser(object):
    def __init__(self, filename):
        self.filename = filename
        if not os.path.isfile(self.filename):
            print "Please supply a valid source path"
            sys.exit(1)

        with open(self.filename, 'rb') as f:
            self.data = f.read()

        self.verify_header()

    def verify_header(self):
        """ Verifies 0xCAFEBABE header present
            (Java class file header) """
        header = struct.unpack("cccc", self.data[:4])
        if header != 0xCAFEBABE:
            print "File", self.filename, "does not appear to be a valid" +\
                " Java classfile. Header was", repr(header), "expected", repr(0xCAFEBABE)
            sys.exit(1)

When I feed it a valid Java classfile, I receive:

File myclass.class does not appear to be a valid Java classfile. Header was ('\xca', '\xfe', '\xba', '\xbe') expected 3405691582

So 0xCAFEBABE is being interpreted as an int by Python -- I feel like I have a critical misunderstanding of something here.

I could rewrite 0xCAFEBABE as "\xca\xfe\xba\xbe" and remove the pack call, but I find that syntax ugly. Is there a way I could get this working with the 0xCAFEBABE literal?

tckmn · Accepted Answer

Try a different argument to unpack:

>>> header = "\xca\xfe\xba\xbe"
>>> struct.unpack(">L", header)
(3405691582,)
>>> struct.unpack(">L", header)[0] == 0xcafebabe
True

According to the docs, L stands for "unsigned long" (i.e. 4 bytes), and > stands for big-endian (which is the format of these bytes).

Compare bytes in Python 2 from file with hex literal?

Answers (2)

Related Questions