Reputation: 3909
Suppose I have some demographic data as such
demographic.data <- data.frame(nation=c('us', 'us', 'us', 'us', 'us', 'china', 'china', 'china'),
religion=c('christ', 'budhist', 'christ', 'jew', 'jew', 'christ', 'budhist', 'budhist'))
# nation religion
#1 us christ
#2 us budhist
#3 us christ
#4 us jew
#5 us jew
#6 china christ
#7 china budhist
#8 china budhist
I want to calculate the a mass function for religions within each nation. So I could do something like group_by()
the nation and then aggregate by a bunch of sum()
s.
religion.distributions <- demographic.data %>%
group_by(nation) %>%
summarise(n = n(),
christ = sum(religion == 'christ'),
jew = sum(religion == 'jew'),
budhist = sum(religion == 'budhist'))
# nation n christ jew budhist
#
#1 china 3 1 0 2
#2 us 5 2 2 1
Although this produces the correct result for this data, the problem is that I am required to hard-code the religions I want to sum up. This will be a problem if any new religions appear in the data.
Is there a way to automatically have columns for the count of every religion within each group? It should be able to look at all the possible values in the religion
column and start counting them. Solutions that use a dplyr
pipeline would be most elegant.
Upvotes: 1
Views: 38
Reputation: 886938
We can use spread
with count
library(tidyverse)
demographic.data %>%
group_by(nation) %>%
mutate(n = n()) %>%
count(nation, religion, n) %>%
spread(religion, nn, fill = 0)
# A tibble: 2 x 5
# Groups: nation [2]
# nation n budhist christ jew
# <fct> <int> <dbl> <dbl> <dbl>
#1 china 3 2 1 0
#2 us 5 1 2 2
Upvotes: 1