Lua_learner
Lua_learner

Reputation: 45

sort to display top 2 entries of each value from column 1

I have a table, which contains 3 columns as below. The list of Product Brands and Products (column 1 and 2) run into few 100s of entries

Product Brand   |   Product     |   Product Revenue
A               |   biscuit     |   20
B               |   bread       |   33
C               |   candy       |   41
A               |   frozen      |   52
B               |   icecream    |   89
B               |   veggies     |   65
C               |   juice       |   43
D               |   coffee      |   78
E               |   tea         |   21
C               |   chips       |   44
A               |   chips       |   29

Using unix commands could I get the top 2 selling products for each brand (descending sort on revenue)?

Expected output

Product Brand   |   Product     |   Product Revenue
A               |   frozen      |   52
A               |   chips       |   29
B               |   icecream    |   89
B               |   veggies     |   65
C               |   chips       |   44
C               |   juice       |   43
D               |   coffee      |   78
etc

Upvotes: 1

Views: 62

Answers (3)

thanasisp
thanasisp

Reputation: 5975

sort -k1,1 -k5,5nr file | awk 'a[$1]++<2'

Sort them by first column and secondarily by last column, numeric and descending. Then awk keeps only the two first for every brand. This syntax means store each brand as key and increase its value for every appearence. This condition is true only for the first two occurences. And when it is true it prints the line.

Note: I am not sure from the example if you do have | separators, I assumed you have and always have spaces around them, if not just replace the column number you want. In case you need to set the separator for any of these commands, its -t'|' for sort and -F'|' for awk.

Upvotes: 0

Srini V
Srini V

Reputation: 11355

head -n1 test  && tail -n+2 test | sort -k1,1 -k3r,3 | awk 'word!=$1{count=1;word=$1} count<=2{print; count++}'
Product Brand  |   Product     |   Product Revenue
A               |   frozen      |   52
A               |   chips       |   29
B               |   veggies     |   65
B               |   icecream    |   89
C               |   juice       |   43
C               |   chips       |   44
D               |   coffee      |   78
E               |   tea         |   21

Explanation:

  • You can isolate the header with command head -n1 yourfile which will print only the first line of your file = your header.
  • && combining commands
  • tail -n+2 test takes all rows but your header
  • sort -k1,1 -k3r,3 - Sort on first field ascending and then 3rd field descending
  • awk 'word!=$1{count=1;word=$1} count<=2{print; count++}' - When the first word on the current line doesn't match the last seen word reset our count and store the new word: word!=$1{count=1;word=$1} When our count is less than or equal to two print the line and increment count: count<=2{print; count++}

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133518

Could you please try following and let me know if this helps you.

sort -t"|" -k1.1 -k3.1 Input_file | tac | awk -F' +| +' 'NR>1{if(++a[$1]<3){print | "sort -k1"};next} 1'

Adding non-one liner form of solution too now.

sort -t"|" -k1.1 -k3.1 Input_file |
tac |
awk -F' +| +' '
  NR>1{
       if(++a[$1]<3){
         print | "sort -k1"
  };
next
}
1'

Upvotes: 0

Related Questions