Dervin Thunk
Dervin Thunk

Reputation: 20129

Reading every 100 lines (or less) from an open file in Python

I have a file with 100s of thousands of records, one per line. I need to read 100, process them, read another 100, process them and so forth. I don't want to load those many records and keep them in memory. How do I read (until EOF) either 100 or less (when EOF is encountered) lines from an open file using Python?

Upvotes: 0

Views: 3985

Answers (5)

DNA
DNA

Reputation: 42607

A runnable example using the take recipe from the itertools page:

from itertools import islice

# Recipe from https://docs.python.org/2/library/itertools.html
def take(n, iterable):
    "Return first n items of the iterable as a list"
    return list(islice(iterable, n))

if __name__ == "__main__":
   with open('data.txt', 'r') as f:
     while True:
       lines = take(100, f)
       if lines:
         print(lines)
       else:
         break

Upvotes: 2

ggcarmi
ggcarmi

Reputation: 468

file.readlines(sizehint= <line size in Bytes> )

instead of creating your own iterator, you can use the built-in one.

python's method file.readlines() returns a list of all the lines in the file. if the file is too big it wont fit in memory.

so, you can use the parameter sizehint. it will read the sizehint Bytes(and not lines) from the file, and enough more to complete a line, and returns the lines from that.

Only complete lines will be returned.

for example:

file.readlines(sizehint=1000)

it will read the 1000 Bytes from the file.

Upvotes: 1

miradulo
miradulo

Reputation: 29690

You could utilize i_zip_longest in the grouper recipe, which would also address your EOF issue:

with open("my_big_file") as f:
    for chunk_100 in izip_longest(*[f] * 100)
          #record my lines

Here we are simply iterating over our file lines, and specifying our fixed-length chunk to be 100 lines.

A simple example of the grouper recipe (from the docs):

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Upvotes: 1

ILostMySpoon
ILostMySpoon

Reputation: 2409

islice() can be used to retrieve the next n items of an iterator.

from itertools import islice

with open(...) as file:
    while True:
        lines = list(islice(file, 100))
        for line in lines:
            # do stuff
        if not lines:
            break

Upvotes: 7

TigerhawkT3
TigerhawkT3

Reputation: 49310

with open('file.txt', 'r') as f:
    workset = [] # start a work set
    for line in f: # iterate over file
        workset.append(line) # add current line to work set
        if len(workset) == 100: # if 100 items in work set,
            dostuff(workset) # send work set to processing
            workset = [] # make a new work set
    if workset: # if there's an unprocessed work set at the end (<100 items),
        dostuff(workset) # process it

Upvotes: 2

Related Questions