Getting all values of various rows which have the same value in one column with awk

Question

I have a data set (test-file.csv) with tree columns:

node,contact,mail
AAAA,Peter,peter@anything.com
BBBB,Hans,hans@anything.com
CCCC,Dieter,dieter@anything.com
ABABA,Peter,peter@anything.com
CCDDA,Hans,hans@anything.com

I like to extend the header by the column count and rename node to nodes. Furthermore all entries should be sorted after the second column (mail). In the column count I like to get the number of occurences of the column mail, in nodes all the entries having the same value in the column mail should be printed (space separated and alphabetically sorted).

This is what I try to achieve:

contact,mail,count,nodes
Dieter,dieter@anything,com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything,com,2,AAAA ABABA

I have this awk-command:

awk -F"," '
BEGIN{
  FS=OFS=",";
printf "%s,%s,%s,%s
", "contact","mail","count","nodes"
}
NR>1{
    counts[$3]++;     # Increment count of lines.
    contact[$2];      # contact
}
END {
    # Iterate over all third-column values.
    for (x in counts) {
    printf "%s,%s,%s,%s
", contact[x],x,counts[x],"nodes"
    }
}
' test-file.csv | sort --field-separator="," --key=2 -n

However this is my result :-( Nothing but the amount of occurences work.

,Dieter@anything.com,1,nodes
,hans@anything.com,2,nodes
,peter@anything.com,2,nodes
contact,mail,count,nodes

Any help appreciated!

anubhava · Accepted Answer

You may use this gnu awk:

awk '
BEGIN {
   FS = OFS = ","
   printf "%s,%s,%s,%s
", "contact","mail","count","nodes"
}
NR > 1 {
   ++counts[$3]    # Increment count of lines.
   name[$3] = $2
   map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
   # Iterate over all third-column values.
   PROCINFO["sorted_in"]="@ind_str_asc";
   for (k in counts)
       print name[k], k, counts[k], map[k]
}
' test-file.csv

Output:

contact,mail,count,nodes
Dieter,dieter@anything.com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything.com,2,AAAA ABABA

Getting all values of various rows which have the same value in one column with awk

Answers (2)

Related Questions