Reputation: 171
Not entirely sure if I've worded that correctly, but here's what I'm trying to do.
I have a file which I typically open in a GUI hex editor, make a few modifications, then save and exit. I've been looking to figure out how to automate this process entirely with Python. I can't seem to get my regex search pattern correct, hopefully somebody can take a moment to see why not?
import binascii, re
infile = my_file.bin
with open(infile, "rb") as f:
data = binascii.b2a_hex(f.read()).upper()
for matches in list(data):
match_list = []
matches = re.findall(b'\x24' + b'\x([A-Z]).{3,10}', data)
match_list.append(matches)
The problem I have is trying to use a special sequence in place of a hex character, since there are many sequences within the original file that I manually search for in order to make the modifications. All sequences begin with '$' in hex ('\x24'), though not all sequences have a similar length; they all have at least 3 following characters, and I want to ensure I catch them all which explains the {3,10}.
Ideally outputting these found sequences into a list for reference, and then creating a dictionary containing the sequence found, paired with the offset it was found at is the end goal. I've extensively looked through page after page of docs trying to find an understandable way to go about this, and I think it can be achieved with the re.groupdict function, though Im at a loss at this point. Any advice/help is appreciated.
[EDIT] Just found a similar question here, though I still feel my situation is different in that my regex pattern uses a special sequence instead of a static search.
Upvotes: 0
Views: 3074
Reputation: 149075
You have no reason to convert anything into hex, Python re
module can easily search in raw byte strings. But you really should loop with search
instead of using findall
in order to get the offsets where the strings are found.
The code could become:
import re
infile = "my_file.bin"
with open(infile, "rb") as f:
data = f.read()
matches = [] # initializes the list for the matches
curpos = 0 # current search position (starts at beginning)
pattern = re.compile(br'\$[A-Z]{3,10}') # the pattern to search
while True:
m = pattern.search(data[curpos:]) # search next occurence
if m is None: break # no more could be found: exit loop
matches.append(curpos + m.start(), m.group(0)) # append a pair (pos, string) to matches
curpos += m.end() # next search will start after the end of found string
Upvotes: 2