thh32
thh32

Reputation: 77

Make python program read 2 files line by line in sync and conduct program on each line

This question is two fold: Background: I have 2 large files, each line of file 1 is "AATTGGCCAA" and each line of file 2 is "AATTTTCCAA". Each file has 20,000 lines and I have a python code I have to run on each pair of lines in turn.

Firstly, how would you go about getting the python code to run on the same numbered line of each file e.g. line 1 of both files? Secondly, how would you get the file to move down to line 2 on both files after running on line 1 etc?

Upvotes: 0

Views: 1198

Answers (4)

jfs
jfs

Reputation: 414089

Here's the code that allows you to process lines in sync from multiple files:

from contextlib import ExitStack

with ExitStack() as stack:
     files = [stack.enter_context(open(filename)) for filename in filenames]
     for lines in zip(*files):
         do_something(*lines)

e.g., for 2 files it calls do_something(line_from_file1, line_from_file2) for each pair of lines in the given files.

Upvotes: 0

Blckknght
Blckknght

Reputation: 104692

File objects are iterators. You can pass them to any function that expects an iterable object and it will work. For your specific use case, you want to use the zip builtin function, which iterates over several objects in parallel and yields tuples with one object from each iterable.

with open(filename1) as file1, open(filename2) as file2:
    for line1, line2 in zip(file1, file2):
        do_something(line1, line2)

In Python 3, zip is an iterator, so this is efficient. If you needed to do the same thing in Python 2, you'd probably want to use itertools.izip instead, as the regular zip would cause all the data from both files to be read at into a list up front.

Upvotes: 3

ElmoVanKielmo
ElmoVanKielmo

Reputation: 11290

The following code uses two Python features:
1. Generator function
2. File object treated as iterator

def get_line(file_path):
# Generator function
    with open(file_path) as file_obj:
        for line in file_obj:
            # Give one line and return control to the calling scope
            yield line

# Generator function will not be executed here
# Instead we get two generator instances
lines_a = get_line(path_to_file_a)
lines_b = get_line(path_to_file_b)
while True:
    try:
        # Now grab one line from each generator
        line_pair = (next(lines_a), next(lines_b))
    except StopIteration:
        # This exception means that we hit EOF in one of the files so exit the loop
        break
        do_something(line_pair)

Assuming that your code is wrapped in do_something(line_pair) function which accepts a tuple of length 2 which holds the pair of lines.

Upvotes: 0

Paul Becotte
Paul Becotte

Reputation: 9977

File objects are iterators. You can open them and then call .next() on the object to get the next line. An example

For line in file1:
    other_line = file2.next()
    do_something(line, other_line)

Upvotes: 0

Related Questions