Matheus Calvelli
Matheus Calvelli

Reputation: 37

How to remove lines with repeated value from a text file

i have a text file with various codes (one code per line) in a column and some of them appear more than once (always in order). I would like to know how can i remove those lines with repeated values.

Example: File1.dat

84578    
84581    
84627    
84761    
84761    
84792    
84792   
84792    
84886    
84886    
84905    
84905    
84905

I would like the output to be:

84578    
84581    
84627    
84761    
84792    
84886    
84905

Note: In my file there are no empty spaces between lines. Any solution would do, scripts, terminal commands, etc. Thanks in advance.

Upvotes: 0

Views: 64

Answers (2)

shove
shove

Reputation: 79

file = open("FileWithDublicates.txt","r");
lines = file.readlines()
lines = set(lines)
file.close
file = open("FileWithDublicates.txt","w");
for line in lines:
    file.write(line)

This should do the trick. But also the line break will only exist once

Upvotes: -1

Jean-François Fabre
Jean-François Fabre

Reputation: 140186

Since the duplicate lines are consecutive, With Linux/MSYS you can simply use uniq

Output with your data:

$ uniq lines.txt
84578
84581
84627
84761
84792
84886
84905

Python solution using generator comprehension to check if first line or line different from previous to issue the line in the output file:

with open("lines.txt") as fr,open("uniq.txt","w") as fw:
    for line in (x for i,x in enumerate(fr) if i==0 or lines[i-1]!=x):
        fw.write(line)

Upvotes: 2

Related Questions