snowcrash09
snowcrash09

Reputation: 4804

find bytearray in bytearray with wildcards

I have some large binary files that I need to search for specific sequences of bytes within, e.g.:

find_bytes = bytearray(base64.b16decode('a226fb42'))
with open(filename, "rb") as f:
    file_bytes = bytearray(f.read())
    found_pos = file_bytes.find(find_bytes, 0)

This works great except now I want to be able to denote a specific byte value (say 00 or FF) in the sequence as a wildcard that will match any byte, so for example a2000042 should match any 4-byte sequence starting with a2 and ending 42.

Is there any way to extend the find method to do this, or a better solution?

Using Python 2.7 but willing to switch if necessary..

Upvotes: 3

Views: 8000

Answers (1)

isedev
isedev

Reputation: 19611

You could use a regular expression (they work on bytearrays):

>>> import re
>>> bytes = bytearray('\x01\x02\x03\x04\x05')
>>> re.search(b'\x02.\x04',bytes).group(0)
'\x02\x03\x04'

Just use '.' as the wildcard character.

Might cause problems on very large files though, since the entire file needs to be loaded into a string first.

Upvotes: 5

Related Questions