Find first duplicate line in a file

Question

I want to find the first duplicate line in a text file.

The way I usually find duplicate lines in a file is by using uniq, which takes a sorted file, so I:

sort inputfile | uniq -c | sort -nr > outputfile

to count all the duplicates and print in decreasing order.

By sorting then using uniq, I lose when/where in the original that duplicate occurs, and I am only now interested in what line is the first duplicate.

Any ideas?

John Kugelman · Accepted Answer

awk '{ if(seen[$0]) { print; exit } seen[$0] = 1 }' file

This will keep track of each line and then print the first one it's seen before. If you want the line number, print NR as well.

awk '{ if(seen[$0]) { print NR, $0; exit } seen[$0] = 1 }' file

Find first duplicate line in a file

Answers (2)

Related Questions