Reputation: 65
This should be simple but I have been stumped by it: I am trying to figure out an efficient method for obtaining summary stats of a grouped count. Here's a toy example:
df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
df %>% group_by(pid) %>% count(pid)
which outputs the expected
# A tibble: 4 × 2
# Groups: pid [4]
pid n
<dbl> <int>
1 1 1
2 2 2
3 3 3
4 4 4
However, what if I want a summary of those grouped counts? Attempting to mutate a new variable or add_count hasn't worked I assume because the variables are different sizes. For instance:
df %>% group_by(pid) %>% count(pid) %>% mutate(count = summary(n))
generates an error. What would be a simple way to generate summary statistics of the grouped counts (e.g., min, max, mean, etc.)?
Upvotes: 0
Views: 44
Reputation: 145745
mutate
is for adding columns to a data frame - you don't want that here, you need to pull the column out of the data frame.
df %>%
count(pid) %>%
pull(n) %>%
summary()
Upvotes: 3