Dplyr pipe groupby top_n does not get top_n in group

Question

I'm trying to obtain the top 2 names, sorted alphabetically, per group. I would think that top_n() would select this after I perform a group_by. However, this does not seem to be the case. This code shows the problem.

df <- data.frame(Group = c(0, 0, 0, 1, 1, 1),
                 Name = c("a", "c", "b", "e", "d", "f"))

df <- df %>%
      arrange(Name, Group) %>%
      group_by(Group) %>%
      top_n(2)

df

# A tibble: 2 x 2
# Groups:   Group [1]
  Group Name 
   
1     1 e    
2     1 f

Expected output would be:

df <- df %>%
      arrange(Name, Group) %>%
      group_by(Group) %>%
      top_n(2)
df

      Group Name
1     0    a
2     0    b
3     1    d
4     1    e

Or something similar. Thanks.

Ronak Shah · Accepted Answer

top_n selects top n max values. You seem to need top n min values. You can use index with negative values to get that. Additionaly you don't need to arrange the data when using top_n.

library(dplyr)
df %>% group_by(Group) %>% top_n(-2, Name)


#  Group Name 
#   
#1     0 a    
#2     0 b    
#3     1 e    
#4     1 d

Another way is to arrange the data and select first two rows in each group.

df %>% arrange(Group, Name) %>% group_by(Group) %>% slice(1:2)

Dplyr pipe groupby top_n does not get top_n in group

Answers (2)

Related Questions