Reputation: 64710
I'm trying to use Python to loop over a long binary file filled with 8-byte records.
Each record has the format [ uint16 | uint16 | uint32 ]
(which is "HHI"
in struct-formatting)
Apparently each 8-byte block is getting treated as an int
, instead of an array of 8-bytes, then causing the struct.unpack
call to fail
with open(fname, "rb") as f:
sz=struct.calcsize("HHI")
print(sz) # This shows 8, as expected
for raw in f.read(sz): # Expect this should read 8 bytes into raw
print(type(raw)) # This says raw is an 'int', not a byte-array
record=struct.unpack("HHI", raw ) # "TypeError: a bytes-like object is required, not 'int'"
print(record)
How can I read my file as a series of structures, and print them each out?
Upvotes: 2
Views: 4405
Reputation: 4586
You can also do this using the walrus operator (:=
), and I find that more concise and readable:
fname = '/tmp/foobar.txt'
size = 2
with open(fname, 'rb') as fp:
while chunk := fp.read(size):
print(chunk)
echo 'foobar' > /tmp/foobar.txt
python iter-chunks.py
b'fo'
b'ob'
b'ar'
b'\n'
This implements the solution the OP asked for:
I want the first 8bytes, then iterate to get the next 8, and the following 8, etc, until the full file has been processed
Upvotes: 0
Reputation: 55844
The iter builtin, if passed a callable and a sentinel value will call the callable repeatedly until the sentinel value is returned.
So you can create a partial function with functools.partial (or use a lambda
) and pass it to iter
, like this:
with open('foo.bin', 'rb') as f:
chunker = functools.partial(f.read, 8)
for chunk in iter(chunker, b''): # Read 8 byte chunks until empty byte returned
# Do stuff with chunk
Upvotes: 4
Reputation: 6789
f.read(len)
only returns a byte string. Then raw
will be a single byte.
The correct way of looping is:
with open(fname, 'rb') as f:
while True:
raw = f.read(8)
if len(raw)!=8:
break # ignore the incomplete "record" if any
record = struct.unpack("HHI", raw )
print(record)
Upvotes: 3
Reputation: 339
I've never used this before, but it looks like an initialization issue:
with open(fname, "rb") as f:
fmt = 'HHI'
raw=struct.pack(fmt,1,2,3)
len=struct.calcsize(fmt)
print(len) # This shows 8, as expected
for raw in f.read(len): # Expect this should read 8 bytes into raw
print(type(raw)) # This says raw is an 'int', not a byte-array
record=struct.unpack(fmt, raw ) # "TypeError: a bytes-like object is required, not 'int'"
print(record)
You may want to look at iter_unpack() for optimization if you have adequate ram.
Note that in 3.7, the default value changes from bytes to string. see near end of page https://docs.python.org/3/library/struct.html#struct.pack
Upvotes: 0