user39617
user39617

Reputation: 11

Can someone please explain this Awk script?

This script finds duplicate entries from first column and prints the entries from second column group wise. I would like to know how the script achieves this.

awk '{c[$1]++; k[$1]=k[$1] " " $2} END {for (i in c) {if (c[i]>1) print k[i]}}'

Upvotes: 0

Views: 83

Answers (2)

Ed Morton
Ed Morton

Reputation: 204648

If whoever wrote it had just used meaningful variable names and indentation I bet you wouldn't even have to ask:

awk '
{
    count[$1]++
    values[$1] = values[$1] " " $2
}
END {
    for (key in count) {
        if (count[key] > 1) {
            print values[key]
        }
    }
}
'

It could've been written better with a ternary expression as:

awk '
{ values[$1] = (count[$1]++ ? values[$1] " " : "") $2 }
END {
    for (key in count) {
        if (count[key] > 1) {
            print values[key]
        }
    }
}
'

to avoid having a leading or trailing blank and there's a few other minor improvements could be made too.

Upvotes: 2

James Brown
James Brown

Reputation: 37464

{
    c[$1]++             # count occurances of first field entries
    k[$1]=k[$1] " " $2  # catenate second fields for recurring entries
  # k[$1]=k[$1] $2 " "  # this way output'd look better
} 
END {                   # after counting and catenating
    for (i in c) {      # go thru all entries
        if (c[i]>1)     # and print the catenated second fields for those
            print k[i]  # recurring first fields
    }
}

For example:

key1 data1
key1 data2
key2 data3

would produce output:

 data1 data2

Upvotes: 6

Related Questions