Reputation: 17
So the current table I have looks likes below:
Edu Level | North | South | East |
---|---|---|---|
1 | .10 | .0002 | .77 |
2 | .004 | .0034 | .3498 |
3 | .150 | .357 | .548 |
I formed this table by just taking the proportion of people who had the outcome interest in each group.
This is the code I used to make that table. First I had and expanded table:
EduLevel_prop <- dta %>%
filter(!is.na(Diab))%>%
filter(!is.na(edulevel)) %>%
group_by(edulevel, region_name) %>%
count(Diab) %>%
mutate(perc = prop.table(n))
Then I condensed with it with the following code (this is the code for the table above):
wide_EduLevelprop <- EduLevel_prop %>%
filter(Diab==1)%>%
select(-n)%>%
spread(region_name,perc)
--
I want to transform the table such that the proportion is in the context of a 95% CI. I put the example of row one in the table below.
Edu Level | North | South | East |
---|---|---|---|
95% CI | 95% CI | 95% CI | 95% CI |
1 | .10 (0.09-.12) | .0002 (0.001-0.0003) | .77 (0.69-0.82) |
Upvotes: 1
Views: 139
Reputation: 4272
The below code produces some data similar to your own, estimates the proportions and confidence intervals, and then produces a table where the proportion point estimate and interval are in the same cell.
library(dplyr)
library(tibble)
library(purrr)
library(tidyr)
set.seed(123)
# Generate example data
N <- 100
df <- tibble(edu = sample(1:3, N, replace = TRUE),
region = sample(c("north","south"), N, replace = TRUE),
outcome = sample(0:1, N, replace = TRUE))
df %>% head(10)
#> # A tibble: 10 x 3
#> edu region outcome
#> <int> <chr> <int>
#> 1 3 north 1
#> 2 3 south 1
#> 3 3 south 0
#> 4 2 north 0
#> 5 3 north 1
#> 6 2 north 0
#> 7 2 south 0
#> 8 2 north 1
#> 9 3 south 1
#> 10 1 south 0
prop.test.info()
# Function that takes a 1-row data.frame, conducts a one-group test, and
# extracts inferential quantities in a data.frame
prop.test.info <- function(df_row) {
# Conduct one-sample t-test
result <- df_row %>%
prop.test(
x = .$successes,
p = .$h0,
n = .$sample_size,
alternative = "two.sided",
conf.level = 0.95
)
# Return CIs in a data.frame
data.frame(
ci_low = result$conf.int[1],
ci_high = result$conf.int[2],
prop = result$estimate
)
}
data.frame
# Calculate sample sizes and cumulative successes
prop_df <- df %>% group_by(edu, region) %>%
summarize(
sample_size = n(),
successes = sum(outcome)
) %>%
ungroup()
# Add column for null hypothesis of 0.5
prop_df <- prop_df %>% mutate(h0 = 0.5, id = row_number())
# Conduct tests, add inferential quantities, round values
out <- prop_df %>%
split(.$id) %>%
map( ~ prop.test.info(.x) %>%
bind_cols(.x)) %>%
bind_rows %>%
mutate(across(where(is.numeric), ~ round(.x, 2)))
# Add combine CI and point estimate character variable, drop all variables
# not needed for table
tabvars <- out %>%
mutate(est = paste0(prop, " (", ci_low, " - ", ci_high, ")")) %>%
select(edu, region, est)
tabvars %>%
pivot_wider(names_from = region, values_from = est)
#> # A tibble: 3 x 3
#> edu north south
#> <dbl> <chr> <chr>
#> 1 1 0.31 (0.12 - 0.59) 0.41 (0.19 - 0.67)
#> 2 2 0.36 (0.12 - 0.68) 0.43 (0.23 - 0.66)
#> 3 3 0.59 (0.33 - 0.81) 0.56 (0.31 - 0.78)
Upvotes: 1