Reputation: 10948
I've been trying a few ways to achieve (do
, row_number
) this but still stuck.
I have 3 groups: month, city, and gender.
I would like to get only the top 5 count of these 3 group bys.
This code works fine only with 2 groups:
df_top5_2grp <- df %>%
group_by(month, city) %>%
tally() %>%
top_n(n = 5, wt = n) %>%
arrange(retention_month, desc(n))
However, it won't return the top 5 count if I add an additional group:
df_top5_3grp <- df %>%
group_by(month, city, gender) %>%
tally() %>%
top_n(n = 5, wt = n) %>%
arrange(retention_month, gender, desc(n))
It returns all rows instead. The only difference is I added gender
.
Any help is appreciated. Thanks!
Upvotes: 0
Views: 74
Reputation: 66490
You probably need an ungroup()
in there.
In the first example below, it returns all the rows, since there are 7 groups, each with one row. So returning the top 5 of each of the seven groups returns all rows.
mtcars %>%
group_by(cyl, vs, am) %>% # grouping across three variables
tally() %>% # tally is a summarization that removes the last grouping
top_n(n = 5, wt = n)
# A tibble: 7 x 4
# Groups: cyl, vs [5] # NOTE! This reminds us the data is still grouped
cyl vs am n
<dbl> <dbl> <dbl> <int>
1 4 0 1 1
2 4 1 0 3
3 4 1 1 7
4 6 0 1 3
5 6 1 0 4
6 8 0 0 12
7 8 0 1 2
Adding ungroup makes it so the top 5 filtering happens across all the summarized groups, not within each group.
mtcars %>%
group_by(cyl, vs, am) %>%
tally() %>%
ungroup() %>%
top_n(n = 5, wt = n)
# A tibble: 5 x 4
cyl vs am n
<dbl> <dbl> <dbl> <int>
1 4 1 0 3
2 4 1 1 7
3 6 0 1 3
4 6 1 0 4
5 8 0 0 12
Upvotes: 1