Menno Van Dijk
Menno Van Dijk

Reputation: 903

Dplyr pipe groupby top_n does not get top_n in group

I'm trying to obtain the top 2 names, sorted alphabetically, per group. I would think that top_n() would select this after I perform a group_by. However, this does not seem to be the case. This code shows the problem.

df <- data.frame(Group = c(0, 0, 0, 1, 1, 1),
                 Name = c("a", "c", "b", "e", "d", "f"))

df <- df %>%
      arrange(Name, Group) %>%
      group_by(Group) %>%
      top_n(2)

df

# A tibble: 2 x 2
# Groups:   Group [1]
  Group Name 
  <dbl> <chr>
1     1 e    
2     1 f 

Expected output would be:

df <- df %>%
      arrange(Name, Group) %>%
      group_by(Group) %>%
      top_n(2)
df

      Group Name
1     0    a
2     0    b
3     1    d
4     1    e

Or something similar. Thanks.

Upvotes: 0

Views: 577

Answers (2)

akrun
akrun

Reputation: 886938

We can use

library(dplyr)
df %>% 
  arrange(Group, Name) %>% 
  group_by(Group) %>% 
  filter(row_number() < 3)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388817

top_n selects top n max values. You seem to need top n min values. You can use index with negative values to get that. Additionaly you don't need to arrange the data when using top_n.

library(dplyr)
df %>% group_by(Group) %>% top_n(-2, Name)


#  Group Name 
#  <dbl> <chr>
#1     0 a    
#2     0 b    
#3     1 e    
#4     1 d    

Another way is to arrange the data and select first two rows in each group.

df %>% arrange(Group, Name) %>% group_by(Group) %>% slice(1:2)

Upvotes: 1

Related Questions