Reputation: 323
I want to calculate confidence intervals for group proportions in dplyr. I tried some codes based on this website, but I could not make it work.
Sample data:
structure(list(sect = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L), .Label = c("Maronite", "Orthodox", "Catholic", "Armenian",
"Sunni", "Shia", "Druze", "Just a Muslim", "Other", "Don't know"
), class = "factor"), social_trust = structure(c(1L, 1L, 1L,
1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("I must be very careful in dealing with people",
"Most people can be trusted", "Don't know"), class = "factor")), row.names = c(NA,
200L), class = "data.frame")
I used this code for getting this table:
lebanon %>%
filter(!is.na(social_trust), !is.na(sect), sect != "Armenian", sect != "Just a Muslim",
sect != "Other") %>%
group_by(sect) %>%
count(social_trust) %>%
mutate(prop = n / sum(n))
sect social_trust n prop
<fct> <fct> <int> <dbl>
1 Maronite I must be very careful in dealing with people 613 0.968
2 Maronite Most people can be trusted 19 0.0300
3 Maronite Don't know 1 0.00158
4 Orthodox I must be very careful in dealing with people 152 0.944
5 Orthodox Most people can be trusted 6 0.0373
6 Orthodox Don't know 3 0.0186
7 Catholic I must be very careful in dealing with people 107 0.915
8 Catholic Most people can be trusted 9 0.0769
9 Catholic Don't know 1 0.00855
10 Sunni I must be very careful in dealing with people 639 0.980
11 Sunni Most people can be trusted 3 0.00460
12 Sunni Don't know 10 0.0153
13 Shia I must be very careful in dealing with people 549 0.918
14 Shia Most people can be trusted 32 0.0535
15 Shia Don't know 17 0.0284
16 Druze I must be very careful in dealing with people 175 0.921
17 Druze Most people can be trusted 15 0.0789
Ideally, I would like to have upper and lower confidence intervall for each group and bind it next to the column for prop. Any ideas?
Upvotes: 2
Views: 1836
Reputation: 173793
You can mutate
using prop.test
inside an lapply
:
lebanon %>%
filter(!is.na(social_trust),
!is.na(sect),
sect != "Armenian",
sect != "Just a Muslim",
sect != "Other") %>%
group_by(sect) %>%
count(social_trust) %>%
mutate(prop = n / sum(n),
lower = lapply(n, prop.test, n = sum(n)),
upper = sapply(lower, function(x) x$conf.int[2]),
lower = sapply(lower, function(x) x$conf.int[1]))
#># A tibble: 2 x 6
#># Groups: sect [1]
#> sect social_trust n prop lower upper
#> <fct> <fct> <int> <dbl> <dbl> <dbl>
#>1 Sunni I must be very careful in dealing with people 198 0.99 0.961 0.998
#>2 Sunni Don't know 2 0.01 0.00173 0.0395
Upvotes: 4