Reputation: 95
I am working in a script to compare two text files. To achieve this I am using the Python Set | difference(). I create a set for the content of every file like the code below and it works. Well, almost works. I found out that no matter if a line is in both files, while the line has no '\n' the comparison discard it. Since I cannot control if a line will have '\n' or not and seems this issue is very detailed I came here to ask if someone has face this before.
with open(files_to_extract, 'r') as file1:
with open(downloaded_files, 'r') as file2:
same = set(file1).difference(set(file2))
same.discard('\n')
with open(not_found_files, 'w') as file_out:
for line in same:
file_out.write(line)
print(line)
files_to_extract set:
{'FromXXXX_Time20180630_165129.zip\n', 'FromXXXX_Time20180630_1637344.zip', 'FromXXXX_Time20180630_163734.zip\n', 'FromXXXX_Time20180630_170523.zip\n'}
downloaded_files set:
{'FromXXXX_Time20180630_165129.zip\n', 'FromXXXX_Time20180630_163734.zip\n', 'FromXXXX_Time20180630_170523.zip\n'}
not_found_files file is empty and supposed to contain
FromXXXX_Time20180630_1637344.zip
but it is discarded.
Is there a way to compare no matter if there is a '\n' or not? Please advise me.
Upvotes: 0
Views: 161
Reputation: 106891
First of all the FromXXXX_Time20180630_163734.zip\n
item in your downloaded_files
actually has one less 4
than the FromXXXX_Time20180630_1637344.zip
in your files_to_extract
, so it would not match even if \n
was not an issue.
To compare strings with no regard to the trailing \n
, all you would need to do here is to strip it from all strings before adding them to the sets:
same = set(map(str.rstrip, file1)).difference(set(map(str.rstrip, file2)))
Upvotes: 1
Reputation: 121
Yeah, you can compare them by removing the \n...
foo="foo\n"
foo2="foo"
foo=foo.replace('\n','')
foo2=foo2.replace('\n','')
foo==foo2
True
Do that for everything and you've got it.
Upvotes: 0