Reputation: 725
Suppose I have this data set:
df <- data.frame(c('a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b'),
c('c', 'c', 'd', 'e', 'f', 'c', 'e', 'f', 'f', 'f', 'g', 'h', 'f')
) %>% setNames(c('type', 'value'))
type value
1 a c
2 a c
3 a d
4 a e
5 a f
6 a c
7 b e
8 b f
9 b f
10 b f
11 b g
12 b h
13 b f
I'd like to perform some kind of command as follows:
df %>% group_by(type) %>%
summarise_all(funs(largest_group_size))
This would ideally produce a table with the largest number of any value for a and b.
type largest_group_size
1 a 3
2 b 4
This table would have:
Ideally, I'd like to go a step further and calculate the percentage that the largest group is of the whole by type. So (largest_group_size / n()).
Upvotes: 0
Views: 47
Reputation: 21
In two group_by
steps:
df %>%
group_by(type, value) %>%
summarise(groups = n()) %>%
group_by(type) %>%
summarise(largest_group = max(groups),
as_percentage = largest_group / sum(groups))
This gives:
type largest_group as_percentage
<fct> <dbl> <dbl>
1 a 3 0.5
2 b 4 0.571
There is probably a more efficient way, but this is how I would do this in a hurry.
Upvotes: 2