Reputation: 21
For example, I have 2,000 lines in a file, and I want to read 500 lines at a time and do something with these 500 lines before reading another 500 lines. I wonder if anyone would write some quick code for me to learn. Thanks!
Upvotes: 2
Views: 1468
Reputation: 29113
Correct me but i think that this very basic sample will work too:
linesToProceed = 500
with open(filename, 'r') as f:
lines = []
for i,line in enumerate(f):
if (i + 1) % linesToProceed:
# do something with lines in lines
lines = []
else:
lines.append(line)
Upvotes: 0
Reputation: 53829
You could also use itertools.islice
to read 500 lines at a time:
lines = itertools.islice(file_obj, 500)
Upvotes: 0
Reputation: 26258
You could use a generator to group the lines together, and yield them in a way that is convenient to use in a simple for loop. This might get you started:
def chunks_of(iterable, chunk_size=500):
out = []
for item in iterable:
out.append(item)
if len(out) >= chunk_size:
yield out
out = []
if out:
yield out
You can then use this like:
for chunk_of_lines in chunks_of(file('/path/to/file'), chunk_size=500):
# chunk_of_lines is 500 or fewer lines from the file
(Why "500 or fewer"? Because the last chunk might not be 500 lines if the number of lines in the file was not an even multiple of 500.)
Edit: Always check the docs first. Here's a recipe from the itertools docs
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
This creates a list of n iterators on the iterable (in this case, the file object) -- since they are all iterators on the same underlying object, when one advances, the rest will all advance as well -- and then zips their result. izip_longest
works like izip
, but pads its results with the fillvalue
, rather than simply omitting them, as my chunks_of
function does.
Upvotes: 7