Reputation: 4804
I have some large binary files that I need to search for specific sequences of bytes within, e.g.:
find_bytes = bytearray(base64.b16decode('a226fb42'))
with open(filename, "rb") as f:
file_bytes = bytearray(f.read())
found_pos = file_bytes.find(find_bytes, 0)
This works great except now I want to be able to denote a specific byte value (say 00 or FF) in the sequence as a wildcard that will match any byte, so for example a2000042
should match any 4-byte sequence starting with a2
and ending 42
.
Is there any way to extend the find
method to do this, or a better solution?
Using Python 2.7 but willing to switch if necessary..
Upvotes: 3
Views: 8000
Reputation: 19611
You could use a regular expression (they work on bytearrays):
>>> import re
>>> bytes = bytearray('\x01\x02\x03\x04\x05')
>>> re.search(b'\x02.\x04',bytes).group(0)
'\x02\x03\x04'
Just use '.' as the wildcard character.
Might cause problems on very large files though, since the entire file needs to be loaded into a string first.
Upvotes: 5