Reputation: 1281
I have the following csv file file1.csv
sales,artist
10,0131
10,0131
10,10_000 Maniacs
10,1000names
15,E1001 Ways
15,E1001 Ways
10,S101 Strings Orchestra
10,D101 Strings Orchestra
10,x0cc
10,x0cc
I am writing a BASH command to find the total sales for every artist. The output is sorted by total sales value in descending order.
Expected output.
30,E1001 Ways
20,0131
20,x0cc
10,10_000 Maniacs
10,1000names
10,S101 Strings Orchestra
10,D101 Strings Orchestra
I have written the code to find the maximum value but it gives me maximum sales values for all artists and not the total sales maximum for every artist.
sort -nr file1.csv | awk 'BEGIN { FS="," }{ print $2; }'
Any help to solve this? Thanks.
Output
awk -F, 'NR > 1 { sales[$9] += $3 } END { for(s in sales) print sales[s] FS s }' million_songs_metadata_and_sales.csv | sort -nr -k1 | head -10
903,10000 Maniacs
562,51717
513,12012
506,35007
350,37500 Yens
2788,7000 Dying Rats
2325,2002
2210,1001 Ways
1992,1349
1968,1200 Techniques
Upvotes: 0
Views: 65
Reputation: 44043
With GNU awk:
awk -F, 'NR > 1 { sales[$2] += $1 } END { PROCINFO["sorted_in"] = "@val_num_desc"; for(s in sales) print sales[s] FS s }' file1.csv
That is
NR > 1 { # from the second line onwards (to skip the header)
sales[$2] += $1 # sum up the totals
}
END { # and in the end
# GNU-specific: array traversal in numerically descending order of value
PROCINFO["sorted_in"] = "@val_num_desc"
for(s in sales) { # print the lot.
print sales[s] FS s
}
}
With plain awk:
awk -F, 'NR > 1 { sales[$2] += $1 } END { for(s in sales) print sales[s] FS s }' file1.csv | sort -nr
that is, remove the GNU-specific PROCINFO
bit and pipe the result through sort -nr
.
Upvotes: 3