Reputation: 553

Print the duplicate lines in a file using awk

I have a requirement to print all the duplicated lines in a file where in uniq -D option did not support. So I am thinking of an alternative way to print the duplicate lines using awk. I know that, we have an option in awk like below.

testfile.txt

apple
apple
orange
orange
cherry
cherry
kiwi
strawberry
strawberry
papaya
cashew
cashew
pista

The command:

awk 'seen[$0]++' testfile.txt

But the above does print only the unique duplicate lines. I need the same output that uniq -D command retrieves like this.

apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew

Upvotes: 11

Answers (7)

potong

Reputation: 58528

This might work for you (GNU sed):

sed -rn ':a;N;/^([^\n]*)\n\1$/p;//ba;/^([^\n]*)(\n\1)+$/P;//ba;s/.*\n//;ba' file

Read two lines into the pattern space (PS). If the first two lines are duplicate, print them and loop back and read a third line. If the third or subsequent lines are duplicate, print the first and loop back and read another line. Otherwise, remove all but the last line and loop back and read another etc.

Upvotes: 1

Benjamin W.

Reputation: 52451

With sed:

$ sed 'N;/^\(.*\)\n\1$/p;$d;D' testfile.txt
apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew

This does the following:

N                 # Append next line to pattern space
/^\(.*\)\n\1$/p   # Print if lines in pattern space are identical
$d                # Avoid printing lone non-duplicate last line
D                 # Delete first line in pattern space

There are a few limitations:

It only works for contiguous duplicates, i.e., not for
```
apple
orange
apple
```
Lines appearing more than twice in a row throw it off.

Upvotes: 1

dawg

Reputation: 104082

You can do:

$ uniq -d file | awk '1;1'

Upvotes: 0

Ed Morton

Reputation: 204446

No need to parse the file twice:

$ awk 'c[$0]++; c[$0]==2' file
apple
apple
orange
orange
cherry
cherry
strawberry
strawberry
cashew
cashew

Upvotes: 17

glenn jackman

Reputation: 247092

If you want to stick with just plain awk, you'll have to process the file twice: once to generate the counts, once to eliminate the lines with count equal 1:

awk 'NR==FNR {count[$0]++; next} count[$0]>1' testfile.txt testfile.txt

Upvotes: 7

Charlie Martin

Reputation: 8406

awk '{if (x[$1]) { x_count[$1]++; print $0; if (x_count[$1] == 1) { print x[$1] } } x[$1] = $0}' testfile.txt

Upvotes: 0

Lars Fischer

Reputation: 10209

Something like this, if uniq supports -d?

grep -f <(uniq -d testfile.txt ) testfile.txt

Upvotes: 0

Print the duplicate lines in a file using awk

Answers (7)

Related Questions