Sorting array in shell using awk

Question

I need to sort this file in descending order avoiding duplicates

Bob 5 404
Mike 3 404
Bob 19 404
Bob 78 404
Mike 93 404
Joe 7 404

So my result should be

Bob 102
Mike 96
Joe 7

What I have now is this

awk '{if($3 == 404) arr[$1]+=$2}END{for(i in arr)print i, arr[i]}' file

I know that there are sort -d but how I need to use it in awk?

UPDATE

awk 'BEGIN{FS=" "}{if($9 == 404) arr[$1]+=1}END{for(i in arr) print arr[i] | sort -k2nr }' input > output

I get this result

sh: 0:  not found

And my output file is now empty.

mklement0 · Accepted Answer

Reuben L.'s answer contains the right pointers, but doesn't spell out the full solutions:

The POSIX-compliant solution spelled out:

You need to pipe the output from awk to the sort utility, outside of awk:

awk '{ if($3 == 404) arr[$1]+=$2 } END{ for (i in arr) print i, arr[i] }' input |
  sort -rn -k2,2 > output

Note the specifics of the sort command:

-r performs reverse sorting
-n performs numeric sorting
-k2,2 sorts by the 2nd whitespace-separated field only
- by contrast, only specifying -k2 would sort starting from the 2nd field through the remainder of the line - doesn't make a difference here, since the 2nd field is the last field, but it's an important distinction in general.

Note that there's really no benefit to using the nonstandard -V option to get numeric sorting, as -n will do just fine; -V's true purpose is to perform version-number sorting.

^{Note that you could include the sort command inside your awk script - for(i in arr)print i, arr[i] | "sort -nr -k2,2" - note the " around the sort command - but there's little benefit to doing so.}

The GNU awk asort() solution spelled out:

gawk '
  { if ($3 == 404) arr[$1]+=$2 } # build array
  END{
    for (k in arr) { amap[arr[k]] = k }   # create value-to-key(!) map
    asort(arr, asorted, "@val_num_desc")  # sort values numerically, in descending order
    # print in sort order
    for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i]
  }
' input > output

As you can see, this complicates the solution, because 2 extra arrays must be created:

for (k in arr) { amap[arr[k]] = k } creates the "inverse" of the original array in amap: it uses the values of the original array as keys and the corresponding keys as the values.
asort(arr, asorted, "@val_num_desc") then sorts the original array by its values in descending, numerical order ("@val_num_desc") and stores the result in new array asorted.
- Note that the original keys are lost in the process: asorted keys are now numerical indices reflecting the sort order.
for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i] then enumerates asorted by sequential numerical index, which yields the desired sort order; amap[asorted[i]] returns the matching key (e.g., Bob) from the original array for the value at hand.

Sorting array in shell using awk

Answers (2)

Related Questions