Reputation: 3105
I need to sort this file in descending order avoiding duplicates
Bob 5 404
Mike 3 404
Bob 19 404
Bob 78 404
Mike 93 404
Joe 7 404
So my result should be
Bob 102
Mike 96
Joe 7
What I have now is this
awk '{if($3 == 404) arr[$1]+=$2}END{for(i in arr)print i, arr[i]}' file
I know that there are sort -d
but how I need to use it in awk?
UPDATE
awk 'BEGIN{FS=" "}{if($9 == 404) arr[$1]+=1}END{for(i in arr) print arr[i] | sort -k2nr }' input > output
I get this result
sh: 0: not found
And my output file is now empty.
Upvotes: 1
Views: 3651
Reputation: 437953
Reuben L.'s answer contains the right pointers, but doesn't spell out the full solutions:
The POSIX-compliant solution spelled out:
You need to pipe the output from awk
to the sort
utility, outside of awk
:
awk '{ if($3 == 404) arr[$1]+=$2 } END{ for (i in arr) print i, arr[i] }' input |
sort -rn -k2,2 > output
Note the specifics of the sort
command:
-r
performs reverse sorting-n
performs numeric sorting-k2,2
sorts by the 2nd whitespace-separated field only
-k2
would sort starting from the 2nd field through the remainder of the line - doesn't make a difference here, since the 2nd field is the last field, but it's an important distinction in general.Note that there's really no benefit to using the nonstandard -V
option to get numeric sorting, as -n
will do just fine; -V
's true purpose is to perform version-number sorting.
Note that you could include the sort
command inside your awk
script - for(i in arr)print i, arr[i] | "sort -nr -k2,2"
- note the "
around the sort
command - but there's little benefit to doing so.
The GNU awk
asort()
solution spelled out:
gawk '
{ if ($3 == 404) arr[$1]+=$2 } # build array
END{
for (k in arr) { amap[arr[k]] = k } # create value-to-key(!) map
asort(arr, asorted, "@val_num_desc") # sort values numerically, in descending order
# print in sort order
for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i]
}
' input > output
As you can see, this complicates the solution, because 2 extra arrays must be created:
for (k in arr) { amap[arr[k]] = k }
creates the "inverse" of the original array in amap
: it uses the values of the original array as keys and the corresponding keys as the values.asort(arr, asorted, "@val_num_desc")
then sorts the original array by its values in descending, numerical order ("@val_num_desc"
) and stores the result in new array asorted
.
asorted
keys are now numerical indices reflecting the sort order.for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i]
then enumerates asorted
by sequential numerical index, which yields the desired sort order; amap[asorted[i]]
returns the matching key (e.g., Bob
) from the original array for the value at hand.Upvotes: 3
Reputation: 2859
Two possible solutions:
Use gawk
and the built-in asort()
and asorti()
functions
Pipe the output of your awk command to sort -k2 -Vr
. This will sort descending by the second column.
note: the -V
flag is non-standard and is available for GNU sort
. credits to Jonathan Leffler
Upvotes: 0