Reputation: 37
i have a text file with various codes (one code per line) in a column and some of them appear more than once (always in order). I would like to know how can i remove those lines with repeated values.
Example: File1.dat
84578
84581
84627
84761
84761
84792
84792
84792
84886
84886
84905
84905
84905
I would like the output to be:
84578
84581
84627
84761
84792
84886
84905
Note: In my file there are no empty spaces between lines. Any solution would do, scripts, terminal commands, etc. Thanks in advance.
Upvotes: 0
Views: 64
Reputation: 79
file = open("FileWithDublicates.txt","r");
lines = file.readlines()
lines = set(lines)
file.close
file = open("FileWithDublicates.txt","w");
for line in lines:
file.write(line)
This should do the trick. But also the line break will only exist once
Upvotes: -1
Reputation: 140186
Since the duplicate lines are consecutive, With Linux/MSYS you can simply use uniq
Output with your data:
$ uniq lines.txt
84578
84581
84627
84761
84792
84886
84905
Python solution using generator comprehension to check if first line or line different from previous to issue the line in the output file:
with open("lines.txt") as fr,open("uniq.txt","w") as fw:
for line in (x for i,x in enumerate(fr) if i==0 or lines[i-1]!=x):
fw.write(line)
Upvotes: 2