Leustad
Leustad

Reputation: 143

How to compare two files and extract some data with Python

I have 2 files: file1, file2. file2 contains all of file1 and many more. Example:

file1:
data1/111 
data2/222 
data3/333 

file2:
data1/111 \ewr\xcgf\wer 54645623456.xml
data23/42234 \asdqw\aqerf 23525.xml
data2/222 \asd\qwe 234234.xml
data66/2331 \a53\fdf355 12312333311.xml
data3/333 \from\where 123123.xml
data4/444 \xcv\sdf\ghf 98546.xml 
and MANY more...

So, I'm trying to print out the lines which exist on both files BUT from file2. Which means print out must have the extra data in every line. Like the path and the XML file name.

I've tried;

lines1 = open(path1).readlines()
lines2 = open(path2).readlines()

for i in lines1:
    for j in lines2:
        if i in j:
            print(j.rstrip())

This prints all the lines at lines2 but what I'm trying to find out is; search the first line from lines1 in lines2 and if finds it anywhere in lines2, print that line from lines2, so and so forth. So after that it should do the same for the second line in lines1

Can anyone help?

Thank you for your time.

Upvotes: 1

Views: 140

Answers (3)

Leustad
Leustad

Reputation: 143

I have a solution for Cross-Check;

lines1 = open(path1).readlines()
lines2 = open(path2).readlines()

for i in lines1:
    for j in lines2:
        if j.startswith(i.rstrip()):
            print(j.rstrip())
            break

What this does: searches 1 line from lines1 against all the lines from lines2. break prevents the duplicate

Upvotes: 0

enrico.bacis
enrico.bacis

Reputation: 31494

The question is not really clear, but if you know you have the same lines but with more data in some cases for file2, you can just do the following for an O(n) solution:

lines1 = open(path1).readlines()
lines2 = open(path2).readlines()

for line1, line2 in zip(lines1, lines2):
    if line1 != line2:
        print line2.rstrip()

Upvotes: 1

Joe
Joe

Reputation: 2564

lines1 = open(path1).readlines()
lines2 = open(path2).readlines()

for l1 in lines1:
    if l1 in lines2:
        print(l1)

Or using list comprehension:

lines1 = open(path1).readlines()
lines2 = open(path2).readlines()
print([line for line in lines1 if line in lines2])

Upvotes: 1

Related Questions