Reputation: 20129
I have a file with 100s of thousands of records, one per line. I need to read 100, process them, read another 100, process them and so forth. I don't want to load those many records and keep them in memory. How do I read (until EOF) either 100 or less (when EOF is encountered) lines from an open file using Python?
Upvotes: 0
Views: 3985
Reputation: 42607
A runnable example using the take
recipe from the itertools page:
from itertools import islice
# Recipe from https://docs.python.org/2/library/itertools.html
def take(n, iterable):
"Return first n items of the iterable as a list"
return list(islice(iterable, n))
if __name__ == "__main__":
with open('data.txt', 'r') as f:
while True:
lines = take(100, f)
if lines:
print(lines)
else:
break
Upvotes: 2
Reputation: 468
file.readlines(sizehint= <line size in Bytes> )
instead of creating your own iterator, you can use the built-in one.
python's method file.readlines()
returns a list of all the lines in the file.
if the file is too big it wont fit in memory.
so, you can use the parameter sizehint
.
it will read the sizehint
Bytes(and not lines) from the file, and enough more to complete a line, and returns the lines from that.
Only complete lines will be returned.
for example:
file.readlines(sizehint=1000)
it will read the 1000 Bytes from the file.
Upvotes: 1
Reputation: 29690
You could utilize i_zip_longest
in the grouper
recipe, which would also address your EOF issue:
with open("my_big_file") as f:
for chunk_100 in izip_longest(*[f] * 100)
#record my lines
Here we are simply iterating over our file lines, and specifying our fixed-length chunk to be 100 lines.
A simple example of the grouper
recipe (from the docs):
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
Upvotes: 1
Reputation: 2409
islice()
can be used to retrieve the next n
items of an iterator.
from itertools import islice
with open(...) as file:
while True:
lines = list(islice(file, 100))
for line in lines:
# do stuff
if not lines:
break
Upvotes: 7
Reputation: 49310
with open('file.txt', 'r') as f:
workset = [] # start a work set
for line in f: # iterate over file
workset.append(line) # add current line to work set
if len(workset) == 100: # if 100 items in work set,
dostuff(workset) # send work set to processing
workset = [] # make a new work set
if workset: # if there's an unprocessed work set at the end (<100 items),
dostuff(workset) # process it
Upvotes: 2