Reputation: 27
I try to remove both duplicates like:
STANGHOLMEN_TA02_GT11
STANGHOLMEN_TA02_GT41
STANGHOLMEN_TA02_GT81
STANGHOLMEN_TA02_GT11
STANGHOLMEN_TA02_GT81
Result
STANGHOLMEN_TA02_GT41
I tried this script
lines_seen = set()
with open(example.txt, "w") as output_file:
for each_line in open(example2.txt, "r"):
if each_line not in lines_seen:
output_file.write(each_line)
lines_seen.add(each_line)
But unfortunately, it doesn't work as I want, it misses lines and doesn't remove lines. The original file has spaces every now and then between the lines
Upvotes: 2
Views: 290
Reputation: 3866
You need to do 2 passes for it to work correctly. Because with 1 pass you won't know if the current line will be repeated later or not. You should try something like this:
# count each line occurances
lines_count = {}
for each_line in open('example2.txt', "r"):
lines_count[each_line] = lines_count.get(each_line, 0) + 1
# write only the lines that are not repeated
with open('example.txt', "w") as output_file:
for each_line, count in lines_count.items():
if count == 1:
output_file.write(each_line)
Upvotes: 2