Reputation: 321
I am very new at using Python and very rusty with C, so I apologize in advance for how dumb and/or lost I sound.
I have function in C that creates a .dat file containing data. I am opening the file using Python to read the file. One of the things I need to read are a struct that was created in the C function and printed in binary. In my Python code I am at the appropriate line of the file to read in the struct. I have tried both unpacking the stuct item by item and as a whole without success. Most of the items in the struct were declared 'real' in the C code. I am working on this code with someone else and the main source code is his and has declared the variables as 'real'. I need to put this in a loop because I want to read all of the files in the directory that end in '.dat'. To start the loop I have:
for files in os.listdir(path):
if files.endswith(".dat"):
part = open(path + files, "rb")
for line in part:
Which then I read all of the lines previous to the one containing the struct. Then I get to that line and have:
part_struct = part.readline()
r = struct.unpack('<d8', part_struct[0])
I'm trying to just read the first thing stored in the struct. I saw an example of this somewhere on here. And when I try this I'm getting an error that reads:
struct.error: repeat count given without format specifier
I will take any and all tips someone can give me. I have been stuck on this for a few days and have tried many different things. To be honest, I think I don't understand the struct module but I've read as much as I could on it.
Thanks!
Upvotes: 19
Views: 35388
Reputation: 7779
Numpy can be used to read/write binary data. You just need to define a custom np.dtype
instance that defines the memory layout of your c-struct.
For example, here is some C++ code defining a struct (should work just as well for C structs, though I'm not a C expert):
struct MyStruct {
uint16_t FieldA;
uint16_t pad16[3];
uint32_t FieldB;
uint32_t pad32[2];
char FieldC[4];
uint64_t FieldD;
uint64_t FieldE;
};
void write_struct(const std::string& fname, MyStruct h) {
// This function serializes a MyStruct instance and
// writes the binary data to disk.
std::ofstream ofp(fname, std::ios::out | std::ios::binary);
ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));
}
Based on the advice I found at stackoverflow.com/a/5397638, I've included some padding in the struct (the pad16
and pad32
fields) so that serialization will happen in a more predictable way. I think that this is a C++ thing; it might not be necessary when using plain ol' C structs.
Now, in python, we create a numpy.dtype
object describing the memory-layout of MyStruct
:
import numpy as np
my_struct_dtype = np.dtype([
("FieldA" , np.uint16 , ),
("pad16" , np.uint16 , (3,) ),
("FieldB" , np.uint32 ),
("pad32" , np.uint32 , (2,) ),
("FieldC" , np.byte , (4,) ),
("FieldD" , np.uint64 ),
("FieldE" , np.uint64 ),
])
Then use numpy's fromfile
to read the binary file where you've saved your c-struct:
# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]
FieldA = struct_data["FieldA"]
FieldB = struct_data["FieldB"]
FieldC = struct_data["FieldC"]
FieldD = struct_data["FieldD"]
FieldE = struct_data["FieldE"]
if FieldA != expected_value_A:
raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...
The count=1
argument in the above call np.fromfile(..., count=1)
is so that the returned array will have only one element; this means "read the first struct instance from the file". Note that I am indexing [0]
to get that element out of the array.
If you have appended the data from many c-structs to the same file, you can use fromfile(..., count=n)
to read n
struct instances into a numpy array of shape (n,)
. Setting count=-1
, which is the default for the np.fromfile
and np.frombuffer
functions, means "read all of the data", resulting in a 1-dimensional array of shape (number_of_struct_instances,)
.
You can also use the offset
keyword argument to np.fromfile
to control where in the file the data read will begin.
To conclude, here are some numpy functions that will be useful once your custom dtype
has been defined:
np.frombuffer(bytes_data, dtype=...)
:
Interpret the given binary data (e.g. a python bytes
instance)
as a numpy array of the given dtype. You can define a custom
dtype
that describes the memory layout of your c struct.np.fromfile(filename, dtype=...)
:
Read binary data from filename
. Should be the same result as
np.frombuffer(open(filename, "rb").read(), dtype=...)
.ndarray.tobytes()
:
Construct a python bytes
instance containing
raw data from the given numpy array. If the array's data has dtype
corresponding to a c-struct, then the bytes coming from
ndarray.tobytes
can be deserialized
by c/c++ and interpreted as an (array of) instances of that c-struct.ndarray.tofile(filename)
:
Binary data from the array is written to filename
.
This data could then be deserialized by c/c++.
Equivalent to open("filename", "wb").write(a.tobytes())
.Upvotes: 1
Reputation: 414795
You could use ctypes.Structure
or struct.Struct
to specify format of the file. To read structures from the file produced by C code in @perreal's answer:
"""
struct { double v; int t; char c;};
"""
from ctypes import *
class YourStruct(Structure):
_fields_ = [('v', c_double),
('t', c_int),
('c', c_char)]
with open('c_structs.bin', 'rb') as file:
result = []
x = YourStruct()
while file.readinto(x) == sizeof(x):
result.append((x.v, x.t, x.c))
print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]
See io.BufferedIOBase.readinto()
. It is supported in Python 3 but it is undocumented in Python 2.7 for a default file object.
struct.Struct
requires to specify padding bytes (x
) explicitly:
"""
struct { double v; int t; char c;};
"""
from struct import Struct
x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
result = []
while True:
buf = file.read(x.size)
if len(buf) != x.size:
break
result.append(x.unpack_from(buf))
print(result)
It produces the same output.
To avoid unnecessary copying Array.from_buffer(mmap_file)
could be used to get an array of structs from a file:
import mmap # Unix, Windows
from contextlib import closing
with open('c_structs.bin', 'rb') as file:
with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm:
result = (YourStruct * 3).from_buffer(mm) # without copying
print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))
Upvotes: 22
Reputation: 401
I had same problem recently, so I had made module for the task, stored here: http://pastebin.com/XJyZMyHX
example code:
MY_STRUCT="""typedef struct __attribute__ ((__packed__)){
uint8_t u8;
uint16_t u16;
uint32_t u32;
uint64_t u64;
int8_t i8;
int16_t i16;
int32_t i32;
int64_t i64;
long long int lli;
float flt;
double dbl;
char string[12];
uint64_t array[5];
} debugInfo;"""
PACKED_STRUCT='\x01\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\x00\xff\x00\x00\xff\xff\x00\x00\x00\x00\xff\xff\xff\xff*\x00\x00\x00\x00\x00\x00\x00ff\x06@\x14\xaeG\xe1z\x14\x08@testString\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00'
if __name__ == '__main__':
print "String:"
print depack_bytearray_to_str(PACKED_STRUCT,MY_STRUCT,"<" )
print "Bytes in Stuct:"+str(structSize(MY_STRUCT))
nt=depack_bytearray_to_namedtuple(PACKED_STRUCT,MY_STRUCT,"<" )
print "Named tuple nt:"
print nt
print "nt.string="+nt.string
The result should be:
String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)
Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'
Upvotes: 0
Reputation: 9162
A number in the format specifier means a repeat count, but it has to go before the letter, like '<8d'
. However you said you just want to read one element of the struct. I guess you just want '<d'
. I guess you are trying to specify the number of bytes to read as 8, but you don't need to do that. d
assumes that.
I also noticed you are using readline
. That seems wrong for reading binary data. It will read until the next carriage return / line feed, which will occur randomly in binary data. What you want to do is use read(size)
, like this:
part_struct = part.read(8)
r = struct.unpack('<d', part_struct)
Actually, you should be careful, as read
can return less data than you request. You need to repeat it if it does.
part_struct = b''
while len(part_struct) < 8:
data = part.read(8 - len(part_struct))
if not data: raise IOException("unexpected end of file")
part_struct += data
r = struct.unpack('<d', part_struct)
Upvotes: 0
Reputation: 98108
Some C code:
#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
save_type s = { 12.1f, 17, 's'};
FILE *f = fopen("output", "w");
fwrite(&s, sizeof(save_type), 1, f);
fwrite(&s, sizeof(save_type), 1, f);
fwrite(&s, sizeof(save_type), 1, f);
fclose(f);
return 0;
}
Some Python code:
import struct
with open('output', 'rb') as f:
chunk = f.read(16)
while chunk != "":
print len(chunk)
print struct.unpack('dicccc', chunk)
chunk = f.read(16)
Output:
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
but there is also the padding issue. The padded size of save_type
is 16, so we read 3 more characters and ignore them.
Upvotes: 10