thurizas
thurizas

Reputation: 2518

How to programatically construct a format string for struct.unpack?

I'm attempting to read and parse a binary file with Python.

The issue is that the data in the file can be in little-endian or big-endian format, as well as 32- or 64-bit values. In the file header there are a few bytes that specify the data format and size. Let's assume that I've read these in and I know the format and size, and I try to construct a format string as follows:

    if (bitOrder == 1):      # little-endian format
        strData = '<'
    elif (bitOrder == 2):    # bit-endian format
        strData = '>'

    if (dataSize == 1):      # 32-bit data
        strLen = 'L'
    elif (dataSize == 2):
        strLen = 'q'

    strFormat = strData + strLen
    struct.unpack(strFormat, buf)

When I do this I get the error: "struct.error: unpack requires a string argument of length 2", yet if I write struct.unpack('<L', buf) I get the expected result.

On an interactive shell, if I run type(strFormat) I get the result <type, 'str'> and when I run len(strFormat) I get a result of 2.

So, being relatively new to Python, I have the following questions:

  1. Is not str the same as a string? If not, how do I convert between the two?

  2. How would I correctly construct the format string for use in an unpack function?

------ edit ------ to address comments:

  1. at this time I'm using python-2.7 due to constraints of other projects.

  2. I'm trying to avoid posting my code (its several hundred lines long), however here is an interact python (run from inside emacs, if that matters) that shows the behaviour I'm experiencing:

    Python 2.7.5 (default, Jun 17 2014, 18:11:42) 
    [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> >>> >>> >>> 
    >>> import array
    >>> import struct
    >>> header = array.array('B',[0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00,0x3e, 0x00, 0x01, 0x00, 0x00, 0x00, 0x40, 0x04, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x11, 0x00, 0x00, 0x00,0x00, 0x00, 0x00, 0x00,0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00, 0x09, 0x00, 0x40, 0x00, 0x1e, 0x00, 0x1b, 0x00])
    >>> entry = header[24:32]
    >>> phoff = header[32:40]
    >>> shoff = header[40:48]
    >>> strData = '<'
    >>> strLen = 'H'
    >>> strFormat = strData + strLen
    >>> print strFormat
    <H
    >>> type(strFormat)
    <type 'str'>
    >>> len(strFormat)
    2
    >>> temp = struct.unpack(strFormat, entry)
    Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
    struct.error: unpack requires a string argument of length 2
    >>> 
    
  3. fixed types in original code.

Upvotes: 1

Views: 383

Answers (1)

Kevin
Kevin

Reputation: 30151

Going by the interactive session, your problem would appear to be this:

temp = struct.unpack(strFormat, entry)

Earlier, you said:

entry = header[24:32]

entry is 8 bytes long, but strFormat says it should be 2 bytes long. That's what struct is complaining about.

It should also be a bytes object (str under 2.x), not an array.array.

Upvotes: 1

Related Questions