nimrod
nimrod

Reputation: 5732

Shell Script to remove duplicate entries from file

I would like to remove duplicate entries from a file. The file looks like this:

xyabcd1:5!b4RlH/IgYzI:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
xyabcd3:mE7YHNejLCviM:cvsabc
xyabcd1:5!b4RlH/IgYzI:cvsabc
xyabcd4:kQiRgQTU20Y0I:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
xyabcd1:5!b4RlH/IgYzI:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc
xyabcd4:kQiRgQTU20Y0I:cvsabc
xyabcd2:JXfFZCZrL.6HY:cvsabc

How can I remove the duplicates from this file by using shell script?

Upvotes: 2

Views: 4634

Answers (3)

bhardwajhp
bhardwajhp

Reputation: 43

@shadyabhi answer is correct, if the output needs to be redirected to a different file use:

sort -u inFile -o outFile

Upvotes: 0

William Pursell
William Pursell

Reputation: 212238

If you do not want to change the order of the input file, you can do:

$ awk '!v[$0]{ print; v[$0]=1 }' input-file

or, if the file is small enough (less than 4 billion lines, to ensure that no line is repeated 4 billion times), you can do:

$ awk '!v[$0]++' input-file

Depending on the implementation of awk, you may not need to worry about the file being less than 2^32 lines long. The concern is that if you see the same line 2^32 times, you may overflow an integer in the array value, and the 2^32nd instance (or 2^31st) of the duplicate line will be output a second time. In reality, this is highly unlikely to be an issue!

Upvotes: 2

shadyabhi
shadyabhi

Reputation: 17234

From the sort manpage:

-u, --unique with -c, check for strict ordering; without -c, output only the first of an equal run

sort -u yourFile

should do.

Upvotes: 5

Related Questions