Nicosc
Nicosc

Reputation: 323

Calculating confidence interval for group proportions in dplyr

I want to calculate confidence intervals for group proportions in dplyr. I tried some codes based on this website, but I could not make it work.

Sample data:

structure(list(sect = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L), .Label = c("Maronite", "Orthodox", "Catholic", "Armenian", 
"Sunni", "Shia", "Druze", "Just a Muslim", "Other", "Don't know"
), class = "factor"), social_trust = structure(c(1L, 1L, 1L, 
1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("I must be very careful in dealing with people", 
"Most people can be trusted", "Don't know"), class = "factor")), row.names = c(NA, 
200L), class = "data.frame")

I used this code for getting this table:

lebanon %>%
  filter(!is.na(social_trust), !is.na(sect), sect != "Armenian", sect != "Just a Muslim",
         sect != "Other") %>%
  group_by(sect) %>%
  count(social_trust) %>%
  mutate(prop = n / sum(n))


sect     social_trust                                      n    prop
   <fct>    <fct>                                         <int>   <dbl>
 1 Maronite I must be very careful in dealing with people   613 0.968  
 2 Maronite Most people can be trusted                       19 0.0300 
 3 Maronite Don't know                                        1 0.00158
 4 Orthodox I must be very careful in dealing with people   152 0.944  
 5 Orthodox Most people can be trusted                        6 0.0373 
 6 Orthodox Don't know                                        3 0.0186 
 7 Catholic I must be very careful in dealing with people   107 0.915  
 8 Catholic Most people can be trusted                        9 0.0769 
 9 Catholic Don't know                                        1 0.00855
10 Sunni    I must be very careful in dealing with people   639 0.980  
11 Sunni    Most people can be trusted                        3 0.00460
12 Sunni    Don't know                                       10 0.0153 
13 Shia     I must be very careful in dealing with people   549 0.918  
14 Shia     Most people can be trusted                       32 0.0535 
15 Shia     Don't know                                       17 0.0284 
16 Druze    I must be very careful in dealing with people   175 0.921  
17 Druze    Most people can be trusted                       15 0.0789

Ideally, I would like to have upper and lower confidence intervall for each group and bind it next to the column for prop. Any ideas?

Upvotes: 2

Views: 1836

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173793

You can mutate using prop.test inside an lapply:

lebanon %>%
  filter(!is.na(social_trust), 
         !is.na(sect), 
         sect != "Armenian", 
         sect != "Just a Muslim",
         sect != "Other") %>%
  group_by(sect) %>%
  count(social_trust) %>% 
  mutate(prop = n / sum(n), 
         lower = lapply(n, prop.test, n = sum(n)), 
         upper = sapply(lower, function(x) x$conf.int[2]), 
         lower = sapply(lower, function(x) x$conf.int[1]))

#># A tibble: 2 x 6
#># Groups:   sect [1]
#>  sect  social_trust                                      n  prop   lower  upper
#>  <fct> <fct>                                         <int> <dbl>   <dbl>  <dbl>
#>1 Sunni I must be very careful in dealing with people   198  0.99 0.961   0.998 
#>2 Sunni Don't know                                        2  0.01 0.00173 0.0395

Upvotes: 4

Related Questions