Reputation: 107
I have written a program to compare file new1.txt
with new2.txt
and the lines which are there in new1.txt
and not in new2.txt
has to be written to difference.txt file
.
Can someone please have a look and let me know what changes are required in the below given code. The code prints the same value multiple times.
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1 in file1:
for line2 in file2:
if line2 != line1:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
Upvotes: 2
Views: 456
Reputation: 877
I always find working with sets makes comparison of two collections easier. Especially because"does this collection contain this" operations runs i O(1), and most nested loops can be reduced to a single loop (easier to read in my opinion).
with open('test1.txt') as file1, open('test2.txt') as file2, open('diff.txt', 'w') as diff:
s1 = set(file1)
s2 = set(file2)
for e in s1:
if e not in s2:
diff.write(e)
Upvotes: 1
Reputation: 87134
You can use basic set operations for this:
with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
diffs.writelines(set(f1).difference(f2))
According to this reference, this will execute with O(n) where n is the number of lines in the first file. If you know that the second file is significantly smaller than the first you can optimise with set.difference_update()
. This has complexity O(n) where n is the number of lines in the second file. For example:
with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
s = set(f1)
s.difference_update(f2)
diffs.writelines(s)
Upvotes: 0
Reputation: 61
It is because of your for loops.
If I understand well, you want to see what lines in file1 are not present in file2.
So for each line in file1, you have to check if the same line appears in file2. But this is not what you do with your code : for each line in file1, you check every line in file2 (this is right), but each time the line in file2 is different from the line if file1, you print the line in file1! So you should print the line in file1 only AFTER having checked ALL the lines in file2, to be sure the line does not appear at least one time.
It could look like something as below:
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1 in file1:
if line1 not in file2:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
Upvotes: 1
Reputation: 8335
If your file is a big one .You could use this.for-else method
:
the else method below the second for loop is executes only when the second for loop completes it's execution with out break that is if there is no match
Modification:
with open('new1.txt') as file1, open('diff.txt', 'w') as NewFile :
for line1 in file1:
with open('new2.txt') as file2:
for line2 in file2:
if line2 == line1:
break
else:
NewFile.write(line1)
For more on for-else method see this stack overflow question for-else
Upvotes: 1
Reputation: 21941
Here's an example using the with
statement, supposing the files are not too big to fit in the memory
# Open 'new1.txt' as f1, 'new2.txt' as f2 and 'diff.txt' as outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:
# Read the lines from 'new2.txt' and store them into a python set
lines = set(f2.readlines())
# Loop through each line in 'new1.txt'
for line in f1:
# If the line was not in 'new2.txt'
if line not in lines:
# Write the line to the output file
outf.write(line)
The with
statement simply closes the opened file(s) automatically. These two pieces of code are equal:
with open('temp.log') as temp:
temp.write('Temporary logging.')
# equal to:
temp = open('temp.log')
temp.write('Temporary logging.')
temp.close()
Yet an other way using two set
s, but this again isn't too memory effecient. If your files are big, this wont work:
# Again, open the three files as f1, f2 and outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:
# Read the lines in 'new1.txt' and 'new2.txt'
s1, s2 = set(f1.readlines()), set(f2.readlines())
# `s1 - s2 | s2 - s2` returns the differences between two sets
# Now we simply loop through the different lines
for line in s1 - s2 | s2 - s1:
# And output all the different lines
outf.write(line)
Keep in mind, that this last code might not keep the order of your lines
Upvotes: 3
Reputation: 4898
Print to the NewFile, only after comparing with all lines of file2
present = False
for line2 in file2:
if line2 == line1:
present = True
if not present:
NewFile.write(line1)
Upvotes: 0
Reputation: 720
Your loop is executed multiple times. To avoid that, use this:
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1, line2 in izip(file1, file2):
if line2 != line1:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
Upvotes: 0
Reputation: 537
For example you got file1: line1 line2
and file2: line1 line3 line4
When you compare line1 and line3, you write to your output file new line (line1), then you go to compare line1 and line4, again they do not equal, so again you print into your output file (line1)... You need to break both for s, if your condition is true. You can use some help variable to break outer for.
Upvotes: 1