monkut
monkut

Reputation: 43840

Parsing binary data into ctypes Structure object via readinto()

I'm trying to handle a binary format, following the example here:

http://dabeaz.blogspot.jp/2009/08/python-binary-io-handling.html

>>> from ctypes import *
>>> class Point(Structure):
>>>     _fields_ = [ ('x',c_double), ('y',c_double), ('z',c_double) ]
>>>
>>> g = open("foo","rb") # point structure data
>>> q = Point()
>>> g.readinto(q)
24
>>> q.x
2.0

I've defined a Structure of my header and I'm trying to read data into my structure, but I'm having some difficulty. My structure is like this:

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ] 

The ctypes documentation says:

For integer type fields like c_int, a third optional item can be given. It must be a small positive integer defining the bit width of the field.

So for ("more_funky_numbers_7bytes", c_uint, 56), I've tried to define the field as a 7 byte field, but I'm getting the error:

ValueError: number of bits invalid for bit field

So my first problem, is how can I define a 7 byte int field?

Then If I skip that problem and comment out the "more_funky_numbers_7bytes" field, the resulting data get's loaded in.. but as expected only 1 character is loaded into "ascii_text_32bytes". And for some reason returns 16 which I assume is the calculated number of bytes it read into the structure... but If I'm commenting out my "funky number" field and ""ascii_text_32bytes" is only giving one char (1 byte), shouldn't that be 13, not 16???

Then I tried breaking out the char field into a separate structure, and reference that from within my Header structure. But that's not working either...

class StupidStaticCharField(BigEndianStructure):
    _fields_ = [
                ("ascii_text_1", c_byte),
                ("ascii_text_2", c_byte),
                ("ascii_text_3", c_byte),
                ("ascii_text_4", c_byte),
                ("ascii_text_5", c_byte),
                ("ascii_text_6", c_byte),
                ("ascii_text_7", c_byte),
                ("ascii_text_8", c_byte),
                ("ascii_text_9", c_byte),
                ("ascii_text_10", c_byte),
                ("ascii_text_11", c_byte),
                .
                .
                .
                ]

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", StupidStaticCharField),
                ("timestamp_4bytes", c_uint),
                #("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_ushort),
                ("other_flags_1byte", c_ushort),
                ("payload_length_2bytes", c_ushort),

                ] 

So, any ideas how to:

  1. Define a 7 byte field (which I'll need to decode using a defined function)
  2. Define a static char field of 32 bytes

UPDATE

I've found a structure that seems to work...

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char * 32),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_byte * 7),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ]  

Now, however, my remaining question is, why when use .readinto():

f = open(binaryfile, "rb")

mystruct = BinaryHeader()
f.readinto(mystruct)

It's returning 52 and not the expected, 51. Where is that extra byte coming from, and where does it go?

UPDATE 2 For those interested here's an example of an alternative struct method to read values into a namedtuple mentioned by eryksun:

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)

Upvotes: 9

Views: 13750

Answers (1)

user1129665
user1129665

Reputation:

This line definition is actually for defining a bitfield:

...
("more_funky_numbers_7bytes", c_uint, 56),
...

which is wrong here. The size of a bitfield should be less than or equals the size of the type, so c_uint should be at most 32, one extra bit will raise the exception:

ValueError: number of bits invalid for bit field

Example of using the bitfield:

from ctypes import *

class MyStructure(Structure):
    _fields_ = [
        # c_uint8 is 8 bits length
        ('a', c_uint8, 4), # first 4 bits of `a`
        ('b', c_uint8, 2), # next 2 bits of `a`
        ('c', c_uint8, 2), # next 2 bits of `a`
        ('d', c_uint8, 2), # since we are beyond the size of `a`
                           # new byte will be create and `d` will
                           # have the first two bits
    ]

mystruct = MyStructure()

mystruct.a = 0b0000
mystruct.b = 0b11
mystruct.c = 0b00
mystruct.d = 0b11

v = c_uint16()

# copy `mystruct` into `v`, I use Windows
cdll.msvcrt.memcpy(byref(v), byref(mystruct), sizeof(v))

print sizeof(mystruct) # 2 bytes, so 6 bits are left floating, you may
                       # want to memset with zeros
print bin(v.value)     # 0b1100110000

what you need is 7 bytes so what you endup doing is correct:

...
("more_funky_numbers_7bytes", c_byte * 7),
...

As for the size for the structure, It's going to be 52, I extra byte will be padded to align the structure on 4 bytes on 32 bit processor or 8 bytes on 64 bits. Here:

from ctypes import *

class BinaryHeader(BigEndianStructure):
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
        ("ascii_text_32bytes", c_char * 32),
        ("timestamp_4bytes", c_uint),
        ("more_funky_numbers_7bytes", c_byte * 7),
        ("some_flags_1byte", c_byte),
        ("other_flags_1byte", c_byte),
        ("payload_length_2bytes", c_ushort),
    ]

mystruct = BinaryHeader(
    0x11111111,
    '\x22' * 32,
    0x33333333,
    (c_byte * 7)(*([0x44] * 7)),
    0x55,
    0x66,
    0x7777
)

print sizeof(mystruct)

with open('data.txt', 'wb') as f:
    f.write(mystruct)

The extra byte is padded between other_flags_1byte and payload_length_2bytes in the file:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 00 77 77 f.ww
            ^
         extra byte

This is an issue when it comes to the file formats and network protocols. To change it pack it by 1:

 ...
class BinaryHeader(BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
...

the file will be:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 77 77    fww 

As for struct, it won't make it easier in your case. Sadly it doesn't support nested tuples in the format. For example here:

>>> from struct import *
>>>
>>> data = '\x11\x11\x11\x11\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22
\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x33
\x33\x33\x33\x44\x44\x44\x44\x44\x44\x44\x55\x66\x77\x77'
>>>
>>> BinaryHeader = Struct('>I32cI7BBBH')
>>>
>>> BinaryHeader.unpack(data)
(286331153, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', 858993459, 68, 68, 68, 68, 68, 68, 68, 85, 102, 30583)
>>>

This result cannot be used namedtuple, you still have parse it based on the index. It would work if you can do something like '>I(32c)(I)(7B)(B)(B)H'. This feature has been requested here (Extend struct.unpack to produce nested tuples) since 2003 but nothing is done since.

Upvotes: 7

Related Questions