Reputation: 321

reading struct in python from created struct in c

I am very new at using Python and very rusty with C, so I apologize in advance for how dumb and/or lost I sound.

I have function in C that creates a .dat file containing data. I am opening the file using Python to read the file. One of the things I need to read are a struct that was created in the C function and printed in binary. In my Python code I am at the appropriate line of the file to read in the struct. I have tried both unpacking the stuct item by item and as a whole without success. Most of the items in the struct were declared 'real' in the C code. I am working on this code with someone else and the main source code is his and has declared the variables as 'real'. I need to put this in a loop because I want to read all of the files in the directory that end in '.dat'. To start the loop I have:

for files in os.listdir(path):
  if files.endswith(".dat"):
    part = open(path + files, "rb")
    for line in part:

Which then I read all of the lines previous to the one containing the struct. Then I get to that line and have:

      part_struct = part.readline()
      r = struct.unpack('<d8', part_struct[0])

I'm trying to just read the first thing stored in the struct. I saw an example of this somewhere on here. And when I try this I'm getting an error that reads:

struct.error: repeat count given without format specifier

I will take any and all tips someone can give me. I have been stuck on this for a few days and have tried many different things. To be honest, I think I don't understand the struct module but I've read as much as I could on it.

Thanks!

Upvotes: 19

Answers (5)

Jasha

Reputation: 7779

Numpy can be used to read/write binary data. You just need to define a custom np.dtype instance that defines the memory layout of your c-struct.

For example, here is some C++ code defining a struct (should work just as well for C structs, though I'm not a C expert):

struct MyStruct {
    uint16_t FieldA;
    uint16_t pad16[3];
    uint32_t FieldB;
    uint32_t pad32[2];
    char     FieldC[4];
    uint64_t FieldD;
    uint64_t FieldE;
};

void write_struct(const std::string& fname, MyStruct h) {
    // This function serializes a MyStruct instance and
    // writes the binary data to disk.
    std::ofstream ofp(fname, std::ios::out | std::ios::binary);
    ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));

}

Based on the advice I found at stackoverflow.com/a/5397638, I've included some padding in the struct (the pad16 and pad32 fields) so that serialization will happen in a more predictable way. I think that this is a C++ thing; it might not be necessary when using plain ol' C structs.

Now, in python, we create a numpy.dtype object describing the memory-layout of MyStruct:

import numpy as np

my_struct_dtype =  np.dtype([
    ("FieldA"            , np.uint16  ,       ),
    ("pad16"             , np.uint16  , (3,)  ),
    ("FieldB"            , np.uint32          ),
    ("pad32"             , np.uint32  , (2,)  ),
    ("FieldC"            , np.byte    , (4,)  ),
    ("FieldD"            , np.uint64          ),
    ("FieldE"            , np.uint64          ),
])

Then use numpy's fromfile to read the binary file where you've saved your c-struct:

# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]

FieldA         = struct_data["FieldA"]
FieldB         = struct_data["FieldB"]
FieldC         = struct_data["FieldC"]
FieldD         = struct_data["FieldD"]
FieldE         = struct_data["FieldE"]

if FieldA != expected_value_A:
    raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
    raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
    raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...

The count=1 argument in the above call np.fromfile(..., count=1) is so that the returned array will have only one element; this means "read the first struct instance from the file". Note that I am indexing [0] to get that element out of the array.

If you have appended the data from many c-structs to the same file, you can use fromfile(..., count=n) to read n struct instances into a numpy array of shape (n,). Setting count=-1, which is the default for the np.fromfile and np.frombuffer functions, means "read all of the data", resulting in a 1-dimensional array of shape (number_of_struct_instances,).

You can also use the offset keyword argument to np.fromfile to control where in the file the data read will begin.

To conclude, here are some numpy functions that will be useful once your custom dtype has been defined:

Reading binary data as a numpy array:
- np.frombuffer(bytes_data, dtype=...): Interpret the given binary data (e.g. a python bytes instance) as a numpy array of the given dtype. You can define a custom dtype that describes the memory layout of your c struct.
- np.fromfile(filename, dtype=...): Read binary data from filename. Should be the same result as np.frombuffer(open(filename, "rb").read(), dtype=...).
Writing a numpy array as binary data:
- ndarray.tobytes(): Construct a python bytes instance containing raw data from the given numpy array. If the array's data has dtype corresponding to a c-struct, then the bytes coming from ndarray.tobytes can be deserialized by c/c++ and interpreted as an (array of) instances of that c-struct.
- ndarray.tofile(filename): Binary data from the array is written to filename. This data could then be deserialized by c/c++. Equivalent to open("filename", "wb").write(a.tobytes()).

Upvotes: 1

jfs

Reputation: 414795

You could use ctypes.Structure or struct.Struct to specify format of the file. To read structures from the file produced by C code in @perreal's answer:

"""
struct { double v; int t; char c;};
"""
from ctypes import *

class YourStruct(Structure):
    _fields_ = [('v', c_double),
                ('t', c_int),
                ('c', c_char)]

with open('c_structs.bin', 'rb') as file:
    result = []
    x = YourStruct()
    while file.readinto(x) == sizeof(x):
        result.append((x.v, x.t, x.c))

print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]

See io.BufferedIOBase.readinto(). It is supported in Python 3 but it is undocumented in Python 2.7 for a default file object.

struct.Struct requires to specify padding bytes (x) explicitly:

"""
struct { double v; int t; char c;};
"""
from struct import Struct

x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
    result = []
    while True:
        buf = file.read(x.size)
        if len(buf) != x.size:
            break
        result.append(x.unpack_from(buf))

print(result)

It produces the same output.

To avoid unnecessary copying Array.from_buffer(mmap_file) could be used to get an array of structs from a file:

import mmap # Unix, Windows
from contextlib import closing

with open('c_structs.bin', 'rb') as file:
    with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm: 
        result = (YourStruct * 3).from_buffer(mm) # without copying
        print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))

Upvotes: 22

topin89

Reputation: 401

I had same problem recently, so I had made module for the task, stored here: http://pastebin.com/XJyZMyHX

example code:

MY_STRUCT="""typedef struct __attribute__ ((__packed__)){
    uint8_t u8;
    uint16_t u16;
    uint32_t u32;
    uint64_t u64;
    int8_t i8;
    int16_t i16;
    int32_t i32;
    int64_t i64;
    long long int lli;
    float flt;
    double dbl;
    char string[12];
    uint64_t array[5];
} debugInfo;"""

PACKED_STRUCT='\x01\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\x00\xff\x00\x00\xff\xff\x00\x00\x00\x00\xff\xff\xff\xff*\x00\x00\x00\x00\x00\x00\x00ff\x06@\x14\xaeG\xe1z\x14\x08@testString\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00'

if __name__ == '__main__':
    print "String:"
    print depack_bytearray_to_str(PACKED_STRUCT,MY_STRUCT,"<" )
    print "Bytes in Stuct:"+str(structSize(MY_STRUCT))
    nt=depack_bytearray_to_namedtuple(PACKED_STRUCT,MY_STRUCT,"<" )
    print "Named tuple nt:"
    print nt
    print "nt.string="+nt.string

The result should be:

String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)

Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'

Upvotes: 0

morningstar

Reputation: 9162

A number in the format specifier means a repeat count, but it has to go before the letter, like '<8d'. However you said you just want to read one element of the struct. I guess you just want '<d'. I guess you are trying to specify the number of bytes to read as 8, but you don't need to do that. d assumes that.

I also noticed you are using readline. That seems wrong for reading binary data. It will read until the next carriage return / line feed, which will occur randomly in binary data. What you want to do is use read(size), like this:

part_struct = part.read(8)
r = struct.unpack('<d', part_struct)

Actually, you should be careful, as read can return less data than you request. You need to repeat it if it does.

part_struct = b''
while len(part_struct) < 8:
    data = part.read(8 - len(part_struct))
    if not data: raise IOException("unexpected end of file")
    part_struct += data
r = struct.unpack('<d', part_struct)

Upvotes: 0

perreal

Reputation: 98108

Some C code:

#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
    save_type s = { 12.1f, 17, 's'};
    FILE *f = fopen("output", "w");
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fclose(f);
    return 0;
}

Some Python code:

import struct
with open('output', 'rb') as f:
    chunk = f.read(16)
    while chunk != "":
        print len(chunk)
        print struct.unpack('dicccc', chunk)
        chunk = f.read(16)

Output:

(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')

but there is also the padding issue. The padded size of save_type is 16, so we read 3 more characters and ignore them.

Upvotes: 10

reading struct in python from created struct in c

Answers (5)

Related Questions