user1778824
user1778824

Reputation: 369

Comparing 2 files line by line

I have 2 file of the following form:

file1:
work1
7 8 9 10 11
1 2 3 4  5
6 7 8 9  10

file2:
work2
2 3 4 5 5
2 4 7 8 9
work1
7 8 9 10 11
1 2 4 4  5
6 7 8 9  10
work3
1 7 8 9 10

Now I want to compare to file and wherever say the header (work1) is equal..I want to compare the subsequent sections and print the line at which the difference is found. E.g.

 work1 (file1)
7 8 9 10 11
1 2 3 4  5
6 7 8 9  10

work1 (file2)
7 8 9 10 11
1 2 4 4  5
6 7 8 9  10

Now I want to print the line where difference occurs i.e. "1 2 4 4 5"

For doing so I have written the following code:

with open("file1",) as r, open("file2") as w:
    for line in r:
        if "work1" in line:
            for line1 in w:
                if "work1" in line1:
                        print "work1"

However, from here on I am confused as to how can I read both the files parallely. Can someone please help me with this...as I am not getting after comparing "work1"'s how should I read the files parallelly

Upvotes: 1

Views: 1018

Answers (2)

AAA
AAA

Reputation: 1384

You would probably want to try out itertools module in Python. It contains a function called izip that can do what you need, along with a function called islice. You can iterate through the second file until you hit the header you were looking for, and you could slice the header up.

Here's a bit of the code.

from itertools import *    

w = open('file2')
for (i,line) in enumerate(w):
  if "work1" in line:
    iter2 = islice(open('file2'), i, None, 1) # Starts at the correct line

f = open('file1')
for (line1,line2) in izip(f,iter2):
  print line1, line2 # Place your comparisons of the two lines here.

You're guaranteed now that on the first run through of the loop you'll get "work1" on both lines. After that you can compare. Since f is shorter than w, the iterator will exhaust itself and stop once you hit the end of f.

Hopefully I explained that well.

EDIT: Added import statement.

EDIT: We need to reopen file2. This is because iterating through iterables in Python consumes the iterable. So, we need to pass a brand new one to islice so it works!

Upvotes: 1

Dvx
Dvx

Reputation: 279

with open('f1.csv') as f1, open('f2.csv') as f2 :
    i=0
    break_needed = False
    while True :
        r1, r2 = f1.readline(), f2.readline()
        if len(r1) == 0 :
            print "eof found for f1"
            break_needed = True
        if len(r2) == 0 :
            print "eof found for f2"
            break_needed = True
        if break_needed : 
            break
        i += 1
        if r1 != r2 :
            print " line %i"%i
            print "file 1 : " + r1
            print "file 2 : " + r2

Upvotes: 0

Related Questions