Reputation: 2047
I'm trying to work with a settings file that is a binary file to find out how it's stuctured so that I might get some information about file location etc. from it.
As far as I can tell, the interesting data is either exactly after or near escape chars b'\x03\SETTING' - here's an example with a setting I'm interested in 'LQ'..
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
\x03HTAPp\x00\x00\x00\x02\x02\x00\x00\x01\x02L\x02\x00\x00\x00\x01
\x03LQ\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a08.56d829a7_56d82956d829a0.mxf
\x03HTAPp\\x00\x00\x00\x02\x02\x00\x00\x01\x02L\x02\x00\x00\x00\x01
\x03LQ\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a07.56d829a6_56d82956d829a0.mxf
so it looks like each 'sentence' starts with \x03 - & the path I'm looking for here is on the 8th byte after the LQ setting '\x03LQ'
The file also has other settings that I want to capture - and each time it looks like the setting is directly after an escape char and padded by a short desciption of the setting and a number of bytes.
ATM I am reading the binary and can find a specific path (only, if I know how long it is right now)
with open(file, "rb") as abin:
abin.seek(0)
data = abin.read()
foo = re.search(b'\x03LQ', data)
abin.seek(foo.start() + 8) # cursor lands on 8th byte
eg = abin.read(32)
# so I get the path of some length as eg.....
This is not what I want, as I want to read the entire bytestring until the next escape char, and then find the next setting that occurs and read the path.
I'm experimenting with findall(), but it just returns a list of bytes objects that are the same (it seems), and I don't understand how to search for each unique path & the instance of each byte string and read from some cursor position in the data. Eg.
bar = re.findall(b'\x03LQ', data)
for bs in bar:
foo = re.search(bs, data)
abin.seek(foo.start() + 8)
eg = abin.read(64)
print('This is just the same path each time', eg)
Pointers anyone?
Upvotes: 0
Views: 385
Reputation: 7850
The key is to look at the result of your findall()
, which is just going to be:
[b'\x03LQ', b'\x03LQ', b'\x03LQ', ...]
You're only telling it to find a static string, so that's all it's going to return. To make the results useful, you can tell it to instead capture what comes after the given string. Here's an example that will grab everything after the given string until the next \x03
byte:
findall(rb'\x03LQ([^\x03]*)', data)
The parens tell findall()
what part of the match you want, and [^\x03]*
means "match any number of bytes that are not \x03
". The result from your example should be:
[b'\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a08.56d829a7_56d82956d829a0.mxf\n',
b'\x00\x00\x00\\\\Media\\Render_Drive\\mediafiles\\mxf\\k70255.2\\a07.56d829a6_56d82956d829a0.mxf']
Upvotes: 1