Reputation: 51
I'm trying to search a binary file for a series of hexadecimal values, however, I've run into a few issues that I can't quite solve. (1) I'm not sure how to search the entire file and return all the matches. Currently I have f.seek going only as far as I think the value might be, which is no good. (2) I'd like to return the offset in either decimal or hex where there might be a match, although I get 0 each time, so I'm not sure what I did wrong.
example.bin
AA BB CC DD EE FF AB AC AD AE AF BA BB BC BD BE
BF CA CB CC CD CE CF DA DB DC DD DE DF EA EB EC
code:
# coding: utf-8
import struct
import re
with open("example.bin", "rb") as f:
f.seek(30)
num, = struct.unpack(">H", f.read(2))
hexaPattern = re.compile(r'(0xebec)?')
m = re.search(hexaPattern, hex(num))
if m:
print "found a match:", m.group(1)
print " match offset:", m.start()
Maybe there's a better way to do all this?
Upvotes: 3
Views: 13137
Reputation: 48599
- I'm not sure how to search the entire file and return all the matches.
- I'd like to return the offset in either decimal or hex
import re
f = open('data.txt', 'wb')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.close()
f = open('data.txt', 'rb')
data = f.read()
f.close()
pattern = "\xEB\xEC"
regex = re.compile(pattern)
for match_obj in regex.finditer(data):
offset = match_obj.start()
print "decimal: {}".format(offset)
print "hex(): " + hex(offset)
print 'formatted hex: {:02X} \n'.format(offset)
--output:--
decimal: 2
hex(): 0x2
formatted hex: 02
decimal: 6
hex(): 0x6
formatted hex: 06
decimal: 10
hex(): 0xa
formatted hex: 0A
decimal: 14
hex(): 0xe
formatted hex: 0E
decimal: 18
hex(): 0x12
formatted hex: 12
decimal: 22
hex(): 0x16
formatted hex: 16
decimal: 26
hex(): 0x1a
formatted hex: 1A
The positions in the file use 0 based indexing like a list.
e.finditer(pattern, string, flags=0)
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found.Match objects support the following methods and attributes:
start([group])
end([group])
Return the indices of the start and end of the substring matched by group; group defaults to zero (meaning the whole matched substring).
https://docs.python.org/2/library/re.html
Upvotes: 3
Reputation: 1476
try
import re
with open("example.bin", "rb") as f:
f1 = re.search(b'\xEB\xEC', f.read())
print "found a match:", f1 .group()
print " match offset:", f1 .start()
Upvotes: 1