AngryPanda
AngryPanda

Reputation: 1281

BASH: Find maximum value in repeating groups

I have the following csv file file1.csv

sales,artist
10,0131
10,0131
10,10_000 Maniacs
10,1000names
15,E1001 Ways
15,E1001 Ways
10,S101 Strings Orchestra
10,D101 Strings Orchestra
10,x0cc
10,x0cc

I am writing a BASH command to find the total sales for every artist. The output is sorted by total sales value in descending order.

Expected output.

30,E1001 Ways
20,0131
20,x0cc
10,10_000 Maniacs
10,1000names   
10,S101 Strings Orchestra
10,D101 Strings Orchestra

I have written the code to find the maximum value but it gives me maximum sales values for all artists and not the total sales maximum for every artist.

 sort -nr file1.csv | awk 'BEGIN { FS="," }{ print $2; }'

Any help to solve this? Thanks.

Output

awk -F, 'NR > 1 { sales[$9] += $3 } END { for(s in sales) print sales[s] FS s }' million_songs_metadata_and_sales.csv | sort -nr -k1 | head -10

903,10000 Maniacs
562,51717
513,12012
506,35007
350,37500 Yens
2788,7000 Dying Rats
2325,2002
2210,1001 Ways
1992,1349
1968,1200 Techniques

Upvotes: 0

Views: 65

Answers (1)

Wintermute
Wintermute

Reputation: 44043

With GNU awk:

awk -F, 'NR > 1 { sales[$2] += $1 } END { PROCINFO["sorted_in"] = "@val_num_desc"; for(s in sales) print sales[s] FS s }' file1.csv

That is

NR > 1 {                 # from the second line onwards (to skip the header)
  sales[$2] += $1        # sum up the totals
}
END {                    # and in the end

  # GNU-specific: array traversal in numerically descending order of value
  PROCINFO["sorted_in"] = "@val_num_desc"

  for(s in sales) {      # print the lot.
    print sales[s] FS s
  }
}

With plain awk:

awk -F, 'NR > 1 { sales[$2] += $1 } END { for(s in sales) print sales[s] FS s }' file1.csv | sort -nr

that is, remove the GNU-specific PROCINFO bit and pipe the result through sort -nr.

Upvotes: 3

Related Questions