Reputation: 13206
One of the answers for this question says that the following is a good way to read a large binary file without reading the whole thing into memory first:
with open(image_filename, 'rb') as content:
for line in content:
#do anything you want
I thought the whole point of specifying 'rb'
is that the line endings are ignored, therefore how could for line in content
work?
Is this the most "Pythonic" way to read a large binary file or is there a better way?
Upvotes: 4
Views: 3655
Reputation: 104032
I would write a simple helper function to read in the chunks you want:
def read_in_chunks(infile, chunk_size=1024):
while True:
chunk = infile.read(chunk_size)
if chunk:
yield chunk
else:
# The chunk was empty, which means we're at the end
# of the file
return
The use as you would for line in file
like so:
with open(fn. 'rb') as f:
for chunk in read_in_chunks(f):
# do you stuff on that chunk...
BTW: I asked THIS question 5 years ago and this is a variant of an answer at that time...
You can also do:
from collections import partial
with open(fn,'rb') as f:
for chunk in iter(functools.partial(f.read, numBytes),''):
Upvotes: 4
Reputation: 114038
for line in fh
will split at new lines regardless of how you open the file
often with binary files you consume them in chunks
CHUNK_SIZE=1024
for chunk in iter(lambda:fh.read(CHUNK_SIZE),""):
do_something(chunk)
Upvotes: 4
Reputation: 225075
Binary mode means that the line endings aren’t converted and that bytes
objects are read (in Python 3); the file will still be read by “line” when using for line in f
. I’d use read
to read in consistent chunks instead, though.
with open(image_filename, 'rb') as f:
# iter(callable, sentinel) – yield f.read(4096) until b'' appears
for chunk in iter(lambda: f.read(4096), b''):
…
Upvotes: 3