Startec
Startec

Reputation: 13206

Does "for line in file" work with binary files in Python?

One of the answers for this question says that the following is a good way to read a large binary file without reading the whole thing into memory first:

 with open(image_filename, 'rb') as content:
     for line in content:
         #do anything you want

I thought the whole point of specifying 'rb' is that the line endings are ignored, therefore how could for line in content work?

Is this the most "Pythonic" way to read a large binary file or is there a better way?

Upvotes: 4

Views: 3655

Answers (3)

dawg
dawg

Reputation: 104032

I would write a simple helper function to read in the chunks you want:

def read_in_chunks(infile, chunk_size=1024):
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
        else:
            # The chunk was empty, which means we're at the end
            # of the file
            return

The use as you would for line in file like so:

with open(fn. 'rb') as f:
    for chunk in read_in_chunks(f):
        # do you stuff on that chunk...

BTW: I asked THIS question 5 years ago and this is a variant of an answer at that time...


You can also do:

from collections import partial
with open(fn,'rb') as f:
    for chunk in iter(functools.partial(f.read, numBytes),''):

Upvotes: 4

Joran Beasley
Joran Beasley

Reputation: 114038

for line in fh will split at new lines regardless of how you open the file

often with binary files you consume them in chunks

CHUNK_SIZE=1024
for chunk in iter(lambda:fh.read(CHUNK_SIZE),""):
    do_something(chunk)

Upvotes: 4

Ry-
Ry-

Reputation: 225075

Binary mode means that the line endings aren’t converted and that bytes objects are read (in Python 3); the file will still be read by “line” when using for line in f. I’d use read to read in consistent chunks instead, though.

with open(image_filename, 'rb') as f:
    # iter(callable, sentinel) – yield f.read(4096) until b'' appears
    for chunk in iter(lambda: f.read(4096), b''):
        …

Upvotes: 3

Related Questions