Karrtik Iyer
Karrtik Iyer

Reputation: 131

Dplyr: Summarise, mutate and rank within group

When I execute following query on mtcars data set, I get below results.

mtcars %>% 
   group_by(cyl,gear) %>% 
   summarise(total_cnt = n(), totalwt = sum(wt)) %>% 
   arrange(cyl, gear, desc(total_cnt), desc(totalwt)) %>% 
   mutate(rank = dense_rank(desc(total_cnt))) %>% 
   arrange(rank)

 cyl  gear total totalwt  rank
  <dbl> <dbl> <int>   <dbl> <int>
1     4     4     8  19.025     1
2     6     4     4  12.375     1
3     8     3    12  49.249     1
4     4     5     2   3.653     2
5     6     3     2   6.675     2
6     8     5     2   6.740     2
7     4     3     1   2.465     3
8     6     5     1   2.770     3

Now within each group (of ranks), I want to sub rank the observations based on totalwt, so final output should look like (desc order of totalwt within each rank group)

cyl  gear total_cnt totalwt  rank    subrank
  <dbl> <dbl>     <int>   <dbl> <int>   <int>
1     4     4         8  19.025     1    2
2     6     4         4  12.375     1    3
3     8     3        12  49.249     1    1
4     4     5         2   3.653     2    3
5     6     3         2   6.675     2    2
6     8     5         2   6.740     2    1
7     4     3         1   2.465     3    2
8     6     5         1   2.770     3    1

Then finally top 1 where each rank where sub rank = 1, so output would be:

  cyl  gear total_cnt totalwt  rank    subrank
  <dbl> <dbl>     <int>   <dbl> <int>   <int>
3     8     3        12  49.249     1    1
6     8     5         2   6.740     2    1
8     6     5         1   2.770     3    1

Upvotes: 2

Views: 4037

Answers (1)

akrun
akrun

Reputation: 887601

If 'mtcars1' is output from the OP's code, we can use rank to create the 'subrank' after grouping by 'rank'

mtcars2 <- mtcars1 %>%
               group_by(rank) %>%
               mutate(subrank = rank(-totalwt))
mtcars2
#    cyl  gear total_cnt totalwt  rank subrank
#   <dbl> <dbl>     <int>   <dbl> <int>   <dbl>
#1     4     4         8  19.025     1       2
#2     6     4         4  12.375     1       3
#3     8     3        12  49.249     1       1
#4     4     5         2   3.653     2       3
#5     6     3         2   6.675     2       2
#6     8     5         2   6.740     2       1
#7     4     3         1   2.465     3       2
#8     6     5         1   2.770     3       1

Then, we filter the rows where 'subrank' is 1

mtcars2 %>% 
      filter(subrank ==1)
#    cyl  gear total_cnt totalwt  rank subrank
#   <dbl> <dbl>     <int>   <dbl> <int>   <dbl>
#1     8     3        12  49.249     1       1
#2     8     5         2   6.740     2       1
#3     6     5         1   2.770     3       1

Upvotes: 3

Related Questions