Reputation: 21319
I'm reading a binary file made up of records that in C would look like this:
typedef _rec_t
{
char text[20];
unsigned char index[3];
} rec_t;
Now I'm able to parse this into a tuple with 23 distinct values, but would prefer if I could use namedtuple
to combine the first 20 bytes into text
and the three remaining bytes into index
. How can I achieve that? Basically instead of one tuple of 23 values I'd prefer to have two tuples of 20 and 3 values respectively and access these using a "natural name", i.e. by means of namedtuple
.
I am currently using the format "20c3B"
for struct.unpack_from()
.
Note: There are many consecutive records in the string when I call parse_text
.
My code (stripped down to the relevant parts):
#!/usr/bin/env python
import sys
import os
import struct
from collections import namedtuple
def parse_text(data):
fmt = "20c3B"
l = len(data)
sz = struct.calcsize(fmt)
num = l/sz
if not num:
print "ERROR: no records found."
return
print "Size of record %d - number %d" % (sz, num)
#rec = namedtuple('rec', 'text index')
empty = struct.unpack_from(fmt, data)
# Loop through elements
# ...
def main():
if len(sys.argv) < 2:
print "ERROR: need to give file with texts as argument."
sys.exit(1)
s = os.path.getsize(sys.argv[1])
f = open(sys.argv[1])
try:
data = f.read(s)
parse_text(data)
finally:
f.close()
if __name__ == "__main__":
main()
Upvotes: 9
Views: 17903
Reputation: 2830
Here's a subclass of Struct
that packs from any sequence and unpacks to a class of your choosing:
class ObjectStruct(Struct):
def __init__(self, *args, object_cls=tuple, **kwargs):
super().__init__(*args, **kwargs)
self._object_cls = object_cls
def pack(self, object):
return super().pack(*object)
def pack_into(self, buffer, offset, object):
return super().pack_into(buffer, offset, *object)
def unpack(self, *args, **kwargs):
return self._object_cls(*super().unpack(*args, **kwargs))
def unpack_from(self, *args, **kwargs):
return self._object_cls(*super().unpack_from(*args, **kwargs))
def iter_unpack(self, *args, **kwargs):
for item in super().iter_unpack(*args, **kwargs):
yield self._object_cls(*item)
Here's how to use this with a namedtuple
class:
from collections import namedtuple
WAISHeader = namedtuple("WAISHeader", "msg_len msg_type hdr_vers server compression encoding msg_checksum")
WAISHeaderStruct = ObjectStruct("! 10s c c 10s c c c", object_cls=WAISHeader)
headbytes = b"0000000142z2wais 0"
header = WAISHeaderStruct.unpack(headbytes)
headbytes2 = WAISHeaderStruct.pack(header)
Demo:
Python 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.20.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from collections import namedtuple
...: WAISHeader = namedtuple("WAISHeader", "msg_len msg_type hdr_vers server compression encoding msg_checksum")
...: WAISHeaderStruct = ObjectStruct("! 10s c c 10s c c c", object_cls=WAISHeader)
...: headbytes = b"0000000142z2wais 0"
...: header = WAISHeaderStruct.unpack(headbytes)
...: headbytes2 = WAISHeaderStruct.pack(header)
...:
In [2]: headbytes
Out[2]: b'0000000142z2wais 0'
In [3]: header
Out[3]: WAISHeader(msg_len=b'0000000142', msg_type=b'z', hdr_vers=b'2', server=b'wais ', compression=b' ', encoding=b' ', msg_checksum=b'0')
In [4]: headbytes2
Out[4]: b'0000000142z2wais 0'
Upvotes: 0
Reputation: 76745
Here is my answer. I first wrote it using slicing instead of struct.unpack()
but @samy.vilar pointed out that we can just use the "s" format to actually get the string out. (I should have remembered that!)
This answer uses struct.unpack()
twice: once to get the strings out, and once to unpack the second string as an integer.
I'm not sure what you want to do with the "3B"
item, but I'm guessing you want to unpack that as a 24-bit integer. I appended a 0 byte on the end of the 3-char string and unpacked as an integer, in case that is what you want.
Slightly tricky: the line like n, = struct.unpack(...)
unpacks a length-1 tuple into one variable. In Python, the comma makes the tuple, so with one comma after one name we are using tuple unpacking to unpack a length-1 tuple into a single variable.
Also, we can use a with
to open the file, which eliminates the need for the try
block. We can also just use f.read()
to read the whole file in one go, with no need to compute the size of the file.
def parse_text(data):
fmt = "20s3s"
l = len(data)
sz = struct.calcsize(fmt)
if l % sz != 0:
print("ERROR: input data not a multiple of record size")
num_records = l / sz
if not num_records:
print "ERROR: zero-length input file."
return
ofs = 0
while ofs < l:
s, x = struct.unpack(fmt, data[ofs:ofs+sz])
# x is a length-3 string; we can append a 0 byte and unpack as a 32-bit integer
n, = struct.unpack(">I", chr(0) + x) # unpack 24-bit Big Endian int
ofs += sz
... # do something with s and with n or x
def main():
if len(sys.argv) != 2:
print("Usage: program_name <input_file_name>")
sys.exit(1)
_, in_fname = sys.argv
with open(in_fname) as f:
data = f.read()
parse_text(data)
if __name__ == "__main__":
main()
Upvotes: 4
Reputation: 11130
According to the docs: http://docs.python.org/library/struct.html
Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:
>>> record = 'raymond \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
so in your case
>>> import struct
>>> from collections import namedtuple
>>> data = "1"*23
>>> fmt = "20c3B"
>>> Rec = namedtuple('Rec', 'text index')
>>> r = Rec._make([struct.unpack_from(fmt, data)[0:20], struct.unpack_from(fmt, data)[20:]])
>>> r
Rec(text=('1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'), index=(49, 49, 49))
>>>
slicing the unpack variables maybe a problem, if the format was fmt = "20si"
or something standard where we don't return sequential bytes, we wouldn't need to do this.
>>> import struct
>>> from collections import namedtuple
>>> data = "1"*24
>>> fmt = "20si"
>>> Rec = namedtuple('Rec', 'text index')
>>> r = Rec._make(struct.unpack_from(fmt, data))
>>> r
Rec(text='11111111111111111111', index=825307441)
>>>
Upvotes: 9
Reputation: 2909
Why not have parse_text use string slicing (data[:20], data[20:]) to pull apart the two values, and then process each one with struct?
Or take the 23 values and slice them apart into two?
I must be missing something. Perhaps you wish to make this happen via the struct module?
Upvotes: 3