SamSong
SamSong

Reputation: 69

Find first duplicate line in a file

I want to find the first duplicate line in a text file.

The way I usually find duplicate lines in a file is by using uniq, which takes a sorted file, so I:

sort inputfile | uniq -c | sort -nr > outputfile

to count all the duplicates and print in decreasing order.

By sorting then using uniq, I lose when/where in the original that duplicate occurs, and I am only now interested in what line is the first duplicate.

Any ideas?

Upvotes: 0

Views: 605

Answers (2)

John Kugelman
John Kugelman

Reputation: 362117

awk '{ if(seen[$0]) { print; exit } seen[$0] = 1 }' file

This will keep track of each line and then print the first one it's seen before. If you want the line number, print NR as well.

awk '{ if(seen[$0]) { print NR, $0; exit } seen[$0] = 1 }' file

Upvotes: 7

paddy
paddy

Reputation: 63481

Since I know Perl, I tend to use it for one-liners:

perl -e 'foreach (<>) { $n++; if ($l{$_}++) { print "$n\n"; last; } }' < infile

This prints to STDOUT the line number of the first duplicate.

Upvotes: 1

Related Questions