Andy Carlson
Andy Carlson

Reputation: 3909

Count all values found within a grouped dataframe

Suppose I have some demographic data as such

demographic.data <- data.frame(nation=c('us', 'us', 'us', 'us', 'us', 'china', 'china', 'china'),
                               religion=c('christ', 'budhist', 'christ', 'jew', 'jew', 'christ', 'budhist', 'budhist'))

#  nation religion
#1     us   christ
#2     us  budhist
#3     us   christ
#4     us      jew
#5     us      jew
#6  china   christ
#7  china  budhist
#8  china  budhist

I want to calculate the a mass function for religions within each nation. So I could do something like group_by() the nation and then aggregate by a bunch of sum()s.

religion.distributions <- demographic.data %>%
  group_by(nation) %>%
  summarise(n       = n(),
            christ  = sum(religion == 'christ'),
            jew     = sum(religion == 'jew'),
            budhist = sum(religion == 'budhist'))

#  nation     n christ   jew budhist
#
#1 china      3      1     0       2
#2 us         5      2     2       1

Although this produces the correct result for this data, the problem is that I am required to hard-code the religions I want to sum up. This will be a problem if any new religions appear in the data.

Is there a way to automatically have columns for the count of every religion within each group? It should be able to look at all the possible values in the religion column and start counting them. Solutions that use a dplyr pipeline would be most elegant.

Upvotes: 1

Views: 38

Answers (1)

akrun
akrun

Reputation: 886938

We can use spread with count

library(tidyverse)
demographic.data %>% 
    group_by(nation) %>% 
    mutate(n = n()) %>% 
    count(nation, religion, n) %>% 
    spread(religion, nn, fill = 0)
# A tibble: 2 x 5
# Groups:   nation [2]
#  nation     n budhist christ   jew
#  <fct>  <int>   <dbl>  <dbl> <dbl>
#1 china      3       2      1     0
#2 us         5       1      2     2

Upvotes: 1

Related Questions