Sort contents of awk associative array element

Originally, the file has its contents like this:

1.2.3.4: 1,3,4
1.2.3.5: 9,8,7,6
1.2.3.4: 4,5,6
1.2.3.6: 1,1,1

after I have tried sorting it incorrectly I have this:

1.2.3.4: 1,3,4,4,5,6,
1.2.3.5: 9,8,7,6,
1.2.3.6: 1,1,1,

I want to sort it into the following format:

1.2.3.4: 1,3,4,5,6
1.2.3.5: 6,7,8,9
1.2.3.6: 1

but how do I access each comma-delimited character in each element and sort them uniquely ascending deleting duplicates? The only shell script I have managed to use so far accesses the whole element only:

awk -F' ' 'NF>1{a[$1] = a[$1]$2","}END{for(i in a){print i" "a[i] | "sort -t: -k1 "}}' c.txt

Upvotes: 0

Views: 622

Answers (1)

Wintermute
Wintermute

Reputation: 44043

EDIT: I took the intermediate data as input the first time around, when the original data was not yet posted, but of course it's also possible from the original data. Again with GNU awk:

gawk -F '[ ,]' 'BEGIN { PROCINFO["sorted_in"] = "@ind_num_asc" } { for(i = 2; i <= NF; ++i) a[$1][$i]; } END { for(ip in a) { line = ip " "; for(n in a[ip]) { line = line n "," } sub(/,$/, "", line); print line } }' filename

The code works as follows:

BEGIN { 
  PROCINFO["sorted_in"] = "@ind_num_asc"  # GNU-specific: sorted array
                                          # traversal
}
{
  for(i = 2; i <= NF; ++i) a[$1][$i]      # remember numbers by ip
}
END {                                     # in the end:
  for(ip in a) {                          # for all ips:
    line = ip " "                         # construct the line: IP
    for(n in a[ip]) {                     # numbers in order
      line = line n ","
    }
    sub(/,$/, "", line)                   # remove trailing comma
    print line                            # print the result.
  }
}

Old answer for intermediate data:

With GNU awk, assuming that the data is formatted precisely as in the question (with a trailing ,):

gawk -F '[ ,]' 'BEGIN { PROCINFO["sorted_in"] = "@ind_num_asc" } { delete a; for(i = 2; i < NF; ++i) a[$i]; line = $1 " "; for(i in a) line = line i ","; sub(/,$/, "", line); print line; }' filename

The file contents are split along spaces and commas, then the code works as follows:

BEGIN { 
  PROCINFO["sorted_in"] = "@ind_num_asc"  # GNU-specific: sorted array
                                          # traversal, numerically ascending
}
{
  delete a
  for(i = 2; i < NF; ++i) { a[$i] }       # remember the fields in a line.
                                          # duplicates are removed here.
                                          # note that it's < NF instead of
                                          # <= NF because the trailing comma
                                          # leaves us with an empty last
                                          # field.

  line = $1 " "                           # start building line: IP field
  for(i in a) {                           # append numbers separated by
    line = line i ","                     # commas
  }
  sub(/,$/, "", line)                     # remove last trailing comma
  print line                              # print result.
}

Upvotes: 3

Related Questions