aarsmith
aarsmith

Reputation: 65

Obtaining a summary of grouped counts in R

This should be simple but I have been stumped by it: I am trying to figure out an efficient method for obtaining summary stats of a grouped count. Here's a toy example:

df = tibble(pid = c(1,2,2,3,3,3,4,4,4,4), y = rnorm(10))
df %>% group_by(pid) %>% count(pid)

which outputs the expected

# A tibble: 4 × 2
# Groups:   pid [4]
    pid     n
  <dbl> <int>
1     1     1
2     2     2
3     3     3
4     4     4

However, what if I want a summary of those grouped counts? Attempting to mutate a new variable or add_count hasn't worked I assume because the variables are different sizes. For instance:

df %>% group_by(pid) %>% count(pid) %>% mutate(count = summary(n))

generates an error. What would be a simple way to generate summary statistics of the grouped counts (e.g., min, max, mean, etc.)?

Upvotes: 0

Views: 44

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145745

mutate is for adding columns to a data frame - you don't want that here, you need to pull the column out of the data frame.

df %>% 
  count(pid) %>% 
  pull(n) %>% 
  summary()

Upvotes: 3

Related Questions