Reputation: 11
This script finds duplicate entries from first column and prints the entries from second column group wise. I would like to know how the script achieves this.
awk '{c[$1]++; k[$1]=k[$1] " " $2} END {for (i in c) {if (c[i]>1) print k[i]}}'
Upvotes: 0
Views: 83
Reputation: 204648
If whoever wrote it had just used meaningful variable names and indentation I bet you wouldn't even have to ask:
awk '
{
count[$1]++
values[$1] = values[$1] " " $2
}
END {
for (key in count) {
if (count[key] > 1) {
print values[key]
}
}
}
'
It could've been written better with a ternary expression as:
awk '
{ values[$1] = (count[$1]++ ? values[$1] " " : "") $2 }
END {
for (key in count) {
if (count[key] > 1) {
print values[key]
}
}
}
'
to avoid having a leading or trailing blank and there's a few other minor improvements could be made too.
Upvotes: 2
Reputation: 37464
{
c[$1]++ # count occurances of first field entries
k[$1]=k[$1] " " $2 # catenate second fields for recurring entries
# k[$1]=k[$1] $2 " " # this way output'd look better
}
END { # after counting and catenating
for (i in c) { # go thru all entries
if (c[i]>1) # and print the catenated second fields for those
print k[i] # recurring first fields
}
}
For example:
key1 data1
key1 data2
key2 data3
would produce output:
data1 data2
Upvotes: 6