Reputation: 5732
I would like to remove duplicate entries from a file. The file looks like this:
xyabcd1:5!b4RlH/IgYzI:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
xyabcd3:mE7YHNejLCviM:cvsabc
xyabcd1:5!b4RlH/IgYzI:cvsabc
xyabcd4:kQiRgQTU20Y0I:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
xyabcd1:5!b4RlH/IgYzI:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
xyabcd4:kQiRgQTU20Y0I:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
How can I remove the duplicates from this file by using shell script?
Upvotes: 2
Views: 4634
Reputation: 43
@shadyabhi answer is correct, if the output needs to be redirected to a different file use:
sort -u inFile -o outFile
Upvotes: 0
Reputation: 212238
If you do not want to change the order of the input file, you can do:
$ awk '!v[$0]{ print; v[$0]=1 }' input-file
or, if the file is small enough (less than 4 billion lines, to ensure that no line is repeated 4 billion times), you can do:
$ awk '!v[$0]++' input-file
Depending on the implementation of awk, you may not need to worry about the file being less than 2^32 lines long. The concern is that if you see the same line 2^32 times, you may overflow an integer in the array value, and the 2^32nd instance (or 2^31st) of the duplicate line will be output a second time. In reality, this is highly unlikely to be an issue!
Upvotes: 2
Reputation: 17234
From the sort manpage:
-u, --unique with -c, check for strict ordering; without -c, output only the first of an equal run
sort -u yourFile
should do.
Upvotes: 5