Reputation: 1362

AWK - print only duplicates

I have a file:

jeden
dwa
jeden
trzy
trzy
cztery
piec
jeden

This command prints out:

$ awk 'BEGIN {while ((getline < "file") > 0) if(a[$0]++) print }'
jeden
trzy
jeden

I want to print all duplicate:

jeden
jeden
trzy
trzy
jeden

EDIT:

I found an example that works.

awk '{if (x[$1]) { x_count[$1]++; print $0; if (x_count[$1] == 1) { print x[$1] } } x[$1] = $0}' file

I want to do the same, but with getline.

Upvotes: 2

Answers (3)

Dennis Williamson

Reputation: 360485

awk 'BEGIN {while ((getline < "file") > 0) { a[$0]++; if(a[$0] == 2) print; if (a[$0] >= 2) print }}'

When the count is two, it prints the line. When the count is greater than or equal to two, it prints the line. So for the second occurrence, the line is printed twice to "catch up".

Upvotes: 3

Kevin

Reputation: 56129

You'll need to either store all lines in memory or take a second pass through the file. It's probably easier to do the first, and unless it's a massive file, you probably have the memory for it. You can stuff this onto one line, of course, but for ease of understanding here it is as a file.

#!/usr/bin/awk -f

{ 
        lines[NR] = $0
        counts[$0]++ 
}             

END { 
        for(i = 0; i < length(lines); i++) {
                if(counts[lines[i]] > 1) {
                        print lines[i]
                }       
        }       
}

Also, your original would be more concisely written as this:

$ awk 'a[$0]++' file

Upvotes: 1

potong

Reputation: 58498

This might work for you:

awk '{a[$1]++}END{for(x in a)if(a[x]>1)for(i=1;i<=a[x];i++)print x}' file

Upvotes: 0

AWK - print only duplicates

Answers (3)

Related Questions