Reputation: 65
New to posting on StackOverflow (but not reading ), so bear over with my skills.
I am using the {gtsummary}
package, in particular the tbl_summary
function.
I would like to include a 95% confidence interval of the proportions for each of the by
and all the included categorical- and continuous variables
.
Searching through the previous posts, I haven't found a solution to solve the exact problem.
My basic output is created using the following:
tbl <- df %>%
select(group, var_cont, var_cat_1, var_cat_2, var_cat_3, var_cat_4) %>%
tbl_summary(
by = group,
statistic =
list(
all_continuous() ~ "{mean} ({sd})",
all_dichotomous() ~ "{n}/{N} ({p}%)"
),
missing = "no",
digits = all_continuous() ~ 1
)
Based on my data produces the following: tbl_summary output
The majority of my categorical variables are of logical type (i.e. TRUE, FALSE, NA)
I would now like to add a column for each of the group
-level columns containing a 95% confidence interval of proportions, in the form of "{ci_lower}%, {ci_upper}%"
Of my many attempts, and from inspiration from other posts, I created a custom function that uses the freq_table()
function of the {freqtables}
package. I made the function so it would fit in the add_stat
function of {gtsummary}
tbl_summary
.
ci_function <- function(data, variable, by, ...) {
variable <- enquo(variable)
by <- enquo(by)
data %>%
freq_table(!!by, !!variable) %>%
filter %>%
filter(col_cat == TRUE) %>%
select(row_cat, col_var, n, n_row, percent_row, lcl_row, ucl_row) %>%
mutate(
lcl_row = format(lcl_row, digits = 2),
ucl_row = format(ucl_row, digits = 2),
stat = str_glue("{lcl_row}%, {ucl_row}%")
) %>%
select(stat) %>%
t() %>%
as_tibble() %>%
set_names(paste0("add_stat_", seq_len(ncol(.))))
}
Using the ci_function
alone on a selection of the above variables would give me the following:
# A tibble: 1 x 3
add_stat_1 add_stat_2 add_stat_3
<chr> <chr> <chr>
1 0.19%, 9.3% 2.53%, 16.9% 0.34%, 16.3%
When i try to apply the ci_function
to the add_stat
, by:
tbl <- stack_overflow %>%
select(group, var_cont, var_cat_1, var_cat_2, var_cat_3, var_cat_4) %>%
tbl_summary(
by = group,
statistic =
list(
all_continuous() ~ "{mean} ({sd})",
all_dichotomous() ~ "{n}/{N} ({p}%)"
),
missing = "no",
digits = all_continuous() ~ 1
) %>%
add_stat(everything() ~ "ci_function") %>%
modify_table_body(
dplyr::relocate, add_stat_1, .after = stat_1
) %>%
modify_header(starts_with("add_stat_") ~ "**95% CI**")
.. I get error messages (expected for the continuous variable):
There was an error for variable 'var_cont':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_1':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_2':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_3':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_4':
Error: `nm` must be `NULL` or a character vector the same length as `x`
.. and insufficient output tbl_summary output
I am a big fan of the {gtsummary}
package and its customization possibilities.
Can anyone help me how to correct my custom function ci_function
so that it will work for both categorical and continuous variables, and help me how to implement this function in the add_stat
function of {gtsummary}
?
Cheers!
Steffen
Upvotes: 4
Views: 2212
Reputation: 11680
UPDATE 2022-02-13 Solution now uses the add_ci()
function reducing the amount of code significantly.
Use the add_ci()
function to add columns of confidence intervals.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
trial %>%
select(trt, age, response, grade) %>%
tbl_summary(
by = trt,
missing = "no",
statistic = list(all_categorical() ~ "{n}/{N} ({p}%)",
all_continuous() ~ "{mean} ({sd})")
) %>%
add_ci()
Created on 2022-02-13 by the reprex package (v2.0.1)
Upvotes: 4