Reputation: 69
I want to find the first duplicate line in a text file.
The way I usually find duplicate lines in a file is by using uniq, which takes a sorted file, so I:
sort inputfile | uniq -c | sort -nr > outputfile
to count all the duplicates and print in decreasing order.
By sorting then using uniq, I lose when/where in the original that duplicate occurs, and I am only now interested in what line is the first duplicate.
Any ideas?
Upvotes: 0
Views: 605
Reputation: 362117
awk '{ if(seen[$0]) { print; exit } seen[$0] = 1 }' file
This will keep track of each line and then print the first one it's seen before. If you want the line number, print NR as well.
awk '{ if(seen[$0]) { print NR, $0; exit } seen[$0] = 1 }' file
Upvotes: 7
Reputation: 63481
Since I know Perl, I tend to use it for one-liners:
perl -e 'foreach (<>) { $n++; if ($l{$_}++) { print "$n\n"; last; } }' < infile
This prints to STDOUT the line number of the first duplicate.
Upvotes: 1