hex header of file, magic numbers, python

Question

I have a program in Python which analyses file headers and decides which file type it is. (https://github.com/LeoGSA/Browser-Cache-Grabber)

The problem is the following: I read first 24 bytes of a file:

with open (from_folder+"/"+i, "rb") as myfile:
    header=str(myfile.read(24))

then I look for pattern in it:

if y[1] in header:
    shutil.move (from_folder+"/"+i,to_folder+y[2]+i+y[3])

where y = ['/video', r'\x47\x40\x00', '/video/', '.ts']

y[1] is the pattern and = r'\x47\x40\x00'

the file has it inside, as you can see from the picture below.

the program does NOT find this pattern (r'\x47\x40\x00') in the file header.

so, I tried to print header:

You see? Python sees it as 'G@' instead of '\x47\x40'

and if i search for 'G@'+r'\x00' in header - everything is ok. It finds it.

Question: What am I doing wrong? I want to look for r'\x47\x40\x00' and find it. Not for some strange 'G@'+r'\x00'.

OR

why python sees first two numbers as 'G@' and not as '\x47\x40', though the rest of header it sees in HEX? Is there a way to fix it?

User New · Accepted Answer

    with open (from_folder+"/"+i, "rb") as myfile:
        header=myfile.read(24)
        header = str(binascii.hexlify(header))[2:-1]

the result I get is: And I can work with it

4740001b0000b00d0001c100000001efff3690e23dffffff

P.S. But anyway, if anybody will explain what was the problem with 2 first bytes - I would be grateful.

Answers (2)