Reputation: 77
This question is two fold: Background: I have 2 large files, each line of file 1 is "AATTGGCCAA" and each line of file 2 is "AATTTTCCAA". Each file has 20,000 lines and I have a python code I have to run on each pair of lines in turn.
Firstly, how would you go about getting the python code to run on the same numbered line of each file e.g. line 1 of both files? Secondly, how would you get the file to move down to line 2 on both files after running on line 1 etc?
Upvotes: 0
Views: 1198
Reputation: 414089
Here's the code that allows you to process lines in sync from multiple files:
from contextlib import ExitStack
with ExitStack() as stack:
files = [stack.enter_context(open(filename)) for filename in filenames]
for lines in zip(*files):
do_something(*lines)
e.g., for 2 files it calls do_something(line_from_file1, line_from_file2)
for each pair of lines in the given files.
Upvotes: 0
Reputation: 104692
File objects are iterators. You can pass them to any function that expects an iterable object and it will work. For your specific use case, you want to use the zip
builtin function, which iterates over several objects in parallel and yields tuples with one object from each iterable.
with open(filename1) as file1, open(filename2) as file2:
for line1, line2 in zip(file1, file2):
do_something(line1, line2)
In Python 3, zip
is an iterator, so this is efficient. If you needed to do the same thing in Python 2, you'd probably want to use itertools.izip
instead, as the regular zip
would cause all the data from both files to be read at into a list up front.
Upvotes: 3
Reputation: 11290
The following code uses two Python features:
1. Generator function
2. File object treated as iterator
def get_line(file_path):
# Generator function
with open(file_path) as file_obj:
for line in file_obj:
# Give one line and return control to the calling scope
yield line
# Generator function will not be executed here
# Instead we get two generator instances
lines_a = get_line(path_to_file_a)
lines_b = get_line(path_to_file_b)
while True:
try:
# Now grab one line from each generator
line_pair = (next(lines_a), next(lines_b))
except StopIteration:
# This exception means that we hit EOF in one of the files so exit the loop
break
do_something(line_pair)
Assuming that your code is wrapped in do_something(line_pair)
function which accepts a tuple of length 2 which holds the pair of lines.
Upvotes: 0
Reputation: 9977
File objects are iterators. You can open them and then call .next() on the object to get the next line. An example
For line in file1:
other_line = file2.next()
do_something(line, other_line)
Upvotes: 0