Reputation: 87

Reading two lines at a time from two different files in Python

I have two files as shown below:

File 1 (tab delimited):

A1   someinfo1     someinfo2    someinfo3
A1   someinfo1     someinfo2    someinfo3
B1   someinfo1     someinfo2    someinfo3
B1   someinfo1     someinfo2    someinfo3

File 2 (tab delimited):

A1   newinfo1     newinfo2    newinfo3
A1   newinfo1     newinfo2    newinfo3
B1   newinfo1     newinfo2    newinfo3
B1   newinfo1     newinfo2    newinfo3

I want to read two lines together (lines starting with A1 and A1) from File 1 and two lines (lines starting with A1 and A1) from File 2. To be more clear, I have two requirements:

1) Reading two lines from the same file
2) Read same two lines from the other file.

To be precise, I want to read four lines together ( 2 consecutive lines from two files (2 lines from each file)).

I searched online and was able to get a code to read two lines together but only from one file.

with open(File1) as file1:
        for line1,line2 in itertools.izip_longest(*[file1]*2):

Also, I was also able to read one line from each of the two files as:

for i,(line1,line2) in enumerate(itertools.izip(f1,f2)):
        print line1, line2

But I want to do sth like:

Pseudocode:

for line1, line2 from file1 and line_1 and line_2 from file2:
              compare line1 with line2
              compare line1 with line_1
              compare line2 with line_1
              compare line2 with line_2

I am hoping a solution to be a linear time one. All the files have same number of lines and the first column (primary id) is same for the consecutive lines within a file and the other file follows the same order (See the above example).

Thanks.

Upvotes: 3

Answers (4)

John La Rooy

Reputation: 304473

>>> from itertools import izip
>>> with open("file1") as file1, open("file2") as file2:
...     for a1, a2, b1, b2 in izip(file1, file1, file2, file2):
...         print a1, a2, b1, b2
... 
A1   someinfo1     someinfo2    someinfo3
A1   someinfo1     someinfo2    someinfo3
A1   newinfo1     newinfo2    newinfo3
A1   newinfo1     newinfo2    newinfo3

B1   someinfo1     someinfo2    someinfo3
B1   someinfo1     someinfo2    someinfo3
B1   newinfo1     newinfo2    newinfo3
B1   newinfo1     newinfo2    newinfo3

You can make the number of lines a parameter (n) like this

for lines in izip(*[file1]*n+[file2]*n):

now lines will be a tuple with n*2 elements

Upvotes: 1

jfs

Reputation: 414875

Here's a generalization that allows any number of consecutive lines with the same id column:

from itertools import groupby, izip, product

getid = lambda line: line.partition(" ")[0] # first space-separated column
same_id = lambda lines: groupby(lines, key=getid)

with open(File1) as file1, open(File2) as file2:
     for (id1, lines1), (id2, lines2) in izip(same_id(file1), same_id(file2)):
         if id1 != id2: 
            # handle error here
            break
         # compare all possible combinations
         for a, b in product(lines1, lines2): 
             compare(a, b)

Upvotes: 0

abarnert

Reputation: 366133

Let's see how we can put these together. First:

with open(File1) as file1:
    for line1,line2 in itertools.izip_longest(*[file1]*2):

Well, take out the for loop and you've got a 2-line-at-a-time iterator over file, right? So, you can do the same for file2. And then you can zip them together:

with open(File1) as file1, open(File2) as file2:
    f1 = itertools.izip_longest(*[file1]*2)
    f2 = itertools.izip_longest(*[file2]*2)
    for i,((f1_line1, f1_line2), (f2_line1, f2_line2)) in enumerate(itertools.izip(f1,f2)):
        # do stuff

But you really don't want to do this.

First, most people don't intuitively read izip_longest(*[file1]*2) and realize that it's grouping by pairs. Wrap that up as a function. In fact, don't even write the function yourself; take grouper right out of the itertools documentation.

So now, it's:

with open(File1) as file1, open(File2) as file2:
    pairs1 = grouper(2, file1)
    pairs2 = grouper(2, file2)
    for i,((f1_line1, f1_line2), (f2_line1, f2_line2)) in enumerate(itertools.izip(f1,f2)):
        # do stuff

Next, pattern-matching may be cool, but a nested pattern to decompose right in the middle of a complicated expression is a little too much. So, let's break it up, and un-nest things by borrowing flatten from the itertools docs again:

with open(File1) as file1, open(File2) as file2:
    pairs1 = grouper(2, file1)
    pairs2 = grouper(2, file2)
    zipped_pairs = itertools.izip(pairs1, pairs2)
    for i, zipped_pair in enumerate(zipped_pairs):
        f1_line1, f1_line2, f2_line1, f2_line2 = flatten(zipped_pair)
        # do stuff

The advantage of this solution is that it's abstract and generic, which means if you later decide you need groups of 5 lines, or 3 files, the change is obvious.

The disadvantage of this solution is that it's abstract and generic, which means it can't possibly be as simple as doing the concrete equivalent. (For example, if you didn't zip up a pair of groupers, you wouldn't have to flatten the result.)

Upvotes: 1

Pavel Anossov

Reputation: 62948

How about this:

with open('a') as A, open('b') as B:
    while True:
        try:
            lineA1, lineA2, lineB1, lineB2 = next(A), next(A), next(B), next(B)
            # compare lines
            # ...
        except StopIteration:
            break

Upvotes: 6

Reading two lines at a time from two different files in Python

Answers (4)

Related Questions