Reputation: 346

AWK Find Duplicate Value in Column 3. Print Entire Line

Why doesn't this work? I've looked for so long and have found some pretty complex solutions, but I'm thinking this can be simplified and reused...sad :'(

Statement

awk -F"\t" '!seen[$3]++'

File

r1c1    r1c2    r1c3
r2c1    r2c2    r2c3
r3c1    r3c2    r3c3
r4c1    r4c2    r3c3
r5c1    r5c2    r5c3

Desired Output

r3c1    r3c2    r3c3
r4c1    r4c2    r3c3

Code adds a 0 and 1.

[user@host]$ awk '{a[$3]=a[$3] $0 RS c[$3]++} END {for (i in c) if (c[i]>1) printf "%s",a[i]}' file
r3c1    r3c2    r3c3
0r4c1   r4c2    r3c3
1[jcole@dukescri01 srlg]$

Upvotes: 3

Answers (3)

RavinderSingh13

Reputation: 133590

Following awk version may also help you on same(In case you want to get the same order of output as per Input_file itself).

awk 'FNR==NR{a[$3]++;next} a[$3]>1'  Input_file  Input_file

EDIT:

awk '{++a[$3];b[$3]=b[$3]?b[$3] ORS $0:$0}END{for(i in a){if(a[i]>1){print b[i]}}}'   Input_file

Upvotes: 2

RomanPerekhrest

Reputation: 92854

Simply with uniq command:

uniq -f2 -D file

-f N - avoid comparing the first N fields
-D - print all duplicate lines

The output:

r3c1    r3c2    r3c3
r4c1    r4c2    r3c3

In case if the file is unsorted:

sort -k3 file | uniq -f 2 -D

Upvotes: 2

James Brown

Reputation: 37414

In awk, one-pass version that stores records to hash:

$ awk '
{
    a[$3]=a[$3] $0 RS        # store records
    c[$3]++                  # counter
}
END {
    for(i in c)
        if(c[i]>1)           # pick the ones with duplicates
            printf "%s",a[i]
}' file
r3c1    r3c2    r3c3
r4c1    r4c2    r3c3

Upvotes: 4

AWK Find Duplicate Value in Column 3. Print Entire Line

Answers (3)

Related Questions