Reputation: 21319

Is there an elegant way to use struct and namedtuple instead of this?

I'm reading a binary file made up of records that in C would look like this:

typedef _rec_t
{
  char text[20];
  unsigned char index[3];
} rec_t;

Now I'm able to parse this into a tuple with 23 distinct values, but would prefer if I could use namedtuple to combine the first 20 bytes into text and the three remaining bytes into index. How can I achieve that? Basically instead of one tuple of 23 values I'd prefer to have two tuples of 20 and 3 values respectively and access these using a "natural name", i.e. by means of namedtuple.

I am currently using the format "20c3B" for struct.unpack_from().

Note: There are many consecutive records in the string when I call parse_text.

My code (stripped down to the relevant parts):

#!/usr/bin/env python
import sys
import os
import struct
from collections import namedtuple

def parse_text(data):
    fmt = "20c3B"
    l = len(data)
    sz = struct.calcsize(fmt)
    num = l/sz
    if not num:
        print "ERROR: no records found."
        return
    print "Size of record %d - number %d" % (sz, num)
    #rec = namedtuple('rec', 'text index')
    empty = struct.unpack_from(fmt, data)
    # Loop through elements
    # ...

def main():
    if len(sys.argv) < 2:
        print "ERROR: need to give file with texts as argument."
        sys.exit(1)
    s = os.path.getsize(sys.argv[1])
    f = open(sys.argv[1])
    try:
        data = f.read(s)
        parse_text(data)
    finally:
        f.close()

if __name__ == "__main__":
    main()

Upvotes: 9

Answers (4)

smammy

Reputation: 2830

Here's a subclass of Struct that packs from any sequence and unpacks to a class of your choosing:

class ObjectStruct(Struct):
    def __init__(self, *args, object_cls=tuple, **kwargs):
        super().__init__(*args, **kwargs)
        self._object_cls = object_cls
    def pack(self, object):
        return super().pack(*object)
    def pack_into(self, buffer, offset, object):
        return super().pack_into(buffer, offset, *object)
    def unpack(self, *args, **kwargs):
        return self._object_cls(*super().unpack(*args, **kwargs))
    def unpack_from(self, *args, **kwargs):
        return self._object_cls(*super().unpack_from(*args, **kwargs))
    def iter_unpack(self, *args, **kwargs):
        for item in super().iter_unpack(*args, **kwargs):
            yield self._object_cls(*item)

Here's how to use this with a namedtuple class:

from collections import namedtuple
WAISHeader = namedtuple("WAISHeader", "msg_len msg_type hdr_vers server compression encoding msg_checksum")
WAISHeaderStruct = ObjectStruct("! 10s c c 10s c c c", object_cls=WAISHeader)
headbytes = b"0000000142z2wais        0"
header = WAISHeaderStruct.unpack(headbytes)
headbytes2 = WAISHeaderStruct.pack(header)

Demo:

Python 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.20.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from collections import namedtuple
   ...: WAISHeader = namedtuple("WAISHeader", "msg_len msg_type hdr_vers server compression encoding msg_checksum")
   ...: WAISHeaderStruct = ObjectStruct("! 10s c c 10s c c c", object_cls=WAISHeader)
   ...: headbytes = b"0000000142z2wais        0"
   ...: header = WAISHeaderStruct.unpack(headbytes)
   ...: headbytes2 = WAISHeaderStruct.pack(header)
   ...:

In [2]: headbytes
Out[2]: b'0000000142z2wais        0'

In [3]: header
Out[3]: WAISHeader(msg_len=b'0000000142', msg_type=b'z', hdr_vers=b'2', server=b'wais      ', compression=b' ', encoding=b' ', msg_checksum=b'0')

In [4]: headbytes2
Out[4]: b'0000000142z2wais        0'

Upvotes: 0

steveha

Reputation: 76745

Here is my answer. I first wrote it using slicing instead of struct.unpack() but @samy.vilar pointed out that we can just use the "s" format to actually get the string out. (I should have remembered that!)

This answer uses struct.unpack() twice: once to get the strings out, and once to unpack the second string as an integer.

I'm not sure what you want to do with the "3B" item, but I'm guessing you want to unpack that as a 24-bit integer. I appended a 0 byte on the end of the 3-char string and unpacked as an integer, in case that is what you want.

Slightly tricky: the line like n, = struct.unpack(...) unpacks a length-1 tuple into one variable. In Python, the comma makes the tuple, so with one comma after one name we are using tuple unpacking to unpack a length-1 tuple into a single variable.

Also, we can use a with to open the file, which eliminates the need for the try block. We can also just use f.read() to read the whole file in one go, with no need to compute the size of the file.

def parse_text(data):
    fmt = "20s3s"
    l = len(data)
    sz = struct.calcsize(fmt)

    if l % sz != 0:
        print("ERROR: input data not a multiple of record size")

    num_records = l / sz
    if not num_records:
        print "ERROR: zero-length input file."
        return

    ofs = 0
    while ofs < l:
        s, x = struct.unpack(fmt, data[ofs:ofs+sz])
        # x is a length-3 string; we can append a 0 byte and unpack as a 32-bit integer
        n, = struct.unpack(">I", chr(0) + x) # unpack 24-bit Big Endian int
        ofs += sz
        ... # do something with s and with n or x

def main():
    if len(sys.argv) != 2:
        print("Usage: program_name <input_file_name>")
        sys.exit(1)

    _, in_fname = sys.argv

    with open(in_fname) as f:
        data = f.read()
        parse_text(data)

if __name__ == "__main__":
    main()

Upvotes: 4

Samy Vilar

Reputation: 11130

According to the docs: http://docs.python.org/library/struct.html

Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)

so in your case

>>> import struct
>>> from collections import namedtuple
>>> data = "1"*23
>>> fmt = "20c3B"
>>> Rec = namedtuple('Rec', 'text index') 
>>> r = Rec._make([struct.unpack_from(fmt, data)[0:20], struct.unpack_from(fmt, data)[20:]])
>>> r
Rec(text=('1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'), index=(49, 49, 49))
>>>

slicing the unpack variables maybe a problem, if the format was fmt = "20si" or something standard where we don't return sequential bytes, we wouldn't need to do this.

>>> import struct
>>> from collections import namedtuple
>>> data = "1"*24
>>> fmt = "20si"
>>> Rec = namedtuple('Rec', 'text index') 
>>> r = Rec._make(struct.unpack_from(fmt, data))
>>> r
Rec(text='11111111111111111111', index=825307441)
>>>

Upvotes: 9

user1277476

Reputation: 2909

Why not have parse_text use string slicing (data[:20], data[20:]) to pull apart the two values, and then process each one with struct?

Or take the 23 values and slice them apart into two?

I must be missing something. Perhaps you wish to make this happen via the struct module?

Upvotes: 3

Is there an elegant way to use struct and namedtuple instead of this?

Answers (4)

Related Questions