Reputation: 23
I am attempting to create a descriptive table using tbl_summary in R that will show both the column percentage and the row percentage for each category. Consider the following example code using the iris R dataset:
iris_table <- iris %>%
mutate(Sepal.Length.Cat = if_else(Sepal.Length > 5, "Big","Small")) %>%
tbl_summary(by = Species,
include = c(Sepal.Length.Cat, Sepal.Width),
type = list(
Sepal.Width ~ "continuous2"
),
statistic = list(
all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
all_categorical() ~ "{n} ({p}%)"
),
missing_text = "Missing") %>%
add_overall(col_label = "**Overall** <br>N = {n}") %>%
modify_footnote(update = everything() ~ NA)
print(iris_table)
Using the above code, I get the default output for the {p} argument, which is percent = "column". However, I would also like to include a row percent. For example, n = 22 in the cell for Setosa species with "Big" sepal length. Right now, the cell is showing 22 (44%), which 22 / 50 Setosa total. I would like for it to show something like "n = 22 (44%; 19%)", with the 19% being the additional row percent (i.e., 22/ 118 total Big).
I have attempted using the percent = "row" argument that is built into tbl_summary. However, that not only gets rid of the column percent, but it also changes the percents in the Overall column to be row percents as well. I would like the column percents to stay and the Overall category to remain just column percents.
Upvotes: 2
Views: 415
Reputation: 18714
I have yet to find method that is built into gtsummary
. I was able to create a work around. Unfortunately, this is not very dynamic or universal; it's highly dependent on the arrangement in the table you used in your question. It will probably require modification to repurpose.
I created a function that modifies the table, per your request.
fixer <- function(tbl) { # add row percentages to table with column percentages
tabIn <- tbl$inputs
nms <- names(tabIn$type[tabIn$type %in% "categorical"]) # which columns?
# calculate the row %, arrange/format as per table
rp <- table(tabIn$data[, nms], tabIn$data[, tabIn[["by"]]]) %>%
prop.table(1) %>% t() %>% as.vector() %>% style_percent()
# add percentages to the table
map(2:3, \(j) { # for each row in the table
imap(paste0("stat_", 1:3), \(k, i) { # for the stat in each row
val <- tbl$table_body[[j, k]] # collect content
val2 <- str_replace(val, "\\)", paste0("/", rp[(j - 1) * i], "%\\)")) # fix
tbl$table_body[[j, k]] <<- val2 # add update back to table
})
})
tbl # return updated table
}
When you create your table or after calling your table, you call this function. For example:
iris_table %>% fixer() # call table and update
Or with the table as it's made --
library(gtsummary)
iris_table <- iris %>%
mutate(Sepal.Length.Cat = if_else(Sepal.Length > 5, "Big","Small")) %>%
tbl_summary(by = Species,
include = c(Sepal.Length.Cat, Sepal.Width),
type = list(
Sepal.Width ~ "continuous2"
),
statistic = list(
all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
all_categorical() ~ "{n} ({p}%)"
),
percent = "column",
missing_text = "Missing") %>%
add_overall(col_label = "**Overall** <br>N = {n}") %>%
modify_footnote(update = everything() ~ NA) %>% fixer() # <---- I'm new
Check it out
Upvotes: 1
Reputation: 11679
The package wasn't designed to show both row and column percentages. But in the new version of the package, there are more generalizable ways to create bespoke tables. In the example below, we first calculate all the statistics that will appear in the table, then pass them to a new function called tbl_ard_summary()
. It's not the easiest code to read, but it does get you both percentage.
library(cards)
library(gtsummary)
packageVersion("gtsummary")
#> [1] '2.0.1.9002'
iris2 <- iris |>
dplyr::mutate(Sepal.Length.Cat = ifelse(Sepal.Length > 5, "Big","Small"))
# create the primary ARD
ard <- iris2 |>
ard_stack(
.by = Species,
ard_continuous(variables = Sepal.Width),
ard_categorical(variables = Sepal.Length.Cat),
.missing = TRUE,
.attributes = TRUE
) |>
# create ARD for row percentages
bind_ard(
ard_categorical(iris2, by = Species, variables = Sepal.Length.Cat, statistic = ~"p", denominator = "row") |>
dplyr::mutate(stat_name = ifelse(stat_name == "p", "p_row", stat_name))
)
# pass the ARD to gtsummary to create table
ard |>
tbl_ard_summary(
by = Species,
include = c(Sepal.Length.Cat, Sepal.Width),
type = list(
Sepal.Width ~ "continuous2"
),
statistic = list(
all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
all_categorical() ~ "{n} (Column {p}%; Row {p_row}%)"
),
missing_text = "Missing"
) |>
modify_footnote(all_stat_cols() ~ NA) |>
as_kable() # convert to kable to display on SO
Characteristic | setosa | versicolor | virginica |
---|---|---|---|
Sepal.Length.Cat | |||
Big | 22 (Column 44.0%; Row 18.6%) | 47 (Column 94.0%; Row 39.8%) | 49 (Column 98.0%; Row 41.5%) |
Small | 28 (Column 56.0%; Row 87.5%) | 3 (Column 6.0%; Row 9.4%) | 1 (Column 2.0%; Row 3.1%) |
Sepal.Width | |||
Mean (SD) | 3.4 (0.4) | 2.8 (0.3) | 3.0 (0.3) |
Min - Max | 2.3 - 4.4 | 2.0 - 3.4 | 2.2 - 3.8 |
Created on 2024-08-27 with reprex v2.1.1
Upvotes: 3