Robyn Husa
Robyn Husa

Reputation: 23

How to get column and row percent when using tbl_summary in R?

I am attempting to create a descriptive table using tbl_summary in R that will show both the column percentage and the row percentage for each category. Consider the following example code using the iris R dataset:

iris_table <- iris %>%
  mutate(Sepal.Length.Cat = if_else(Sepal.Length > 5, "Big","Small")) %>% 
  tbl_summary(by = Species,
              include = c(Sepal.Length.Cat, Sepal.Width),
              type = list(
                Sepal.Width ~ "continuous2"
              ),
              statistic = list(
                all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
                all_categorical() ~ "{n} ({p}%)"
              ),
              missing_text = "Missing") %>% 
  add_overall(col_label = "**Overall** <br>N = {n}") %>% 
  modify_footnote(update = everything() ~ NA)
print(iris_table)

Using the above code, I get the default output for the {p} argument, which is percent = "column". However, I would also like to include a row percent. For example, n = 22 in the cell for Setosa species with "Big" sepal length. Right now, the cell is showing 22 (44%), which 22 / 50 Setosa total. I would like for it to show something like "n = 22 (44%; 19%)", with the 19% being the additional row percent (i.e., 22/ 118 total Big).

I have attempted using the percent = "row" argument that is built into tbl_summary. However, that not only gets rid of the column percent, but it also changes the percents in the Overall column to be row percents as well. I would like the column percents to stay and the Overall category to remain just column percents.

Upvotes: 2

Views: 415

Answers (2)

Kat
Kat

Reputation: 18714

I have yet to find method that is built into gtsummary. I was able to create a work around. Unfortunately, this is not very dynamic or universal; it's highly dependent on the arrangement in the table you used in your question. It will probably require modification to repurpose.

I created a function that modifies the table, per your request.

fixer <- function(tbl) {   # add row percentages to table with column percentages
  tabIn <- tbl$inputs
  nms <- names(tabIn$type[tabIn$type %in% "categorical"]) # which columns?
                           # calculate the row %, arrange/format as per table
  rp <- table(tabIn$data[, nms], tabIn$data[, tabIn[["by"]]]) %>% 
    prop.table(1) %>% t() %>% as.vector() %>% style_percent()
                           # add percentages to the table
  map(2:3, \(j) {                                # for each row in the table
    imap(paste0("stat_", 1:3), \(k, i) {         # for the stat in each row
      val <- tbl$table_body[[j, k]]              # collect content
      val2 <- str_replace(val, "\\)", paste0("/", rp[(j - 1) * i], "%\\)")) # fix
      tbl$table_body[[j, k]] <<- val2            # add update back to table
      })
    })
  tbl          # return updated table
  }

When you create your table or after calling your table, you call this function. For example:

iris_table %>% fixer()   # call table and update

Or with the table as it's made --

library(gtsummary)

iris_table <- iris %>%
  mutate(Sepal.Length.Cat = if_else(Sepal.Length > 5, "Big","Small")) %>% 
  tbl_summary(by = Species,
              include = c(Sepal.Length.Cat, Sepal.Width),
              type = list(
                Sepal.Width ~ "continuous2"
              ),
              statistic = list(
                all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
                all_categorical() ~ "{n} ({p}%)"
              ),
              percent = "column",
              missing_text = "Missing") %>% 
  add_overall(col_label = "**Overall** <br>N = {n}") %>% 
  modify_footnote(update = everything() ~ NA) %>% fixer()   # <---- I'm new

Check it out

updated table

Upvotes: 1

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11679

The package wasn't designed to show both row and column percentages. But in the new version of the package, there are more generalizable ways to create bespoke tables. In the example below, we first calculate all the statistics that will appear in the table, then pass them to a new function called tbl_ard_summary(). It's not the easiest code to read, but it does get you both percentage.

library(cards)
library(gtsummary)
packageVersion("gtsummary")
#> [1] '2.0.1.9002'

iris2 <- iris |> 
  dplyr::mutate(Sepal.Length.Cat = ifelse(Sepal.Length > 5, "Big","Small"))

# create the primary ARD
ard <- iris2 |>
  ard_stack(
    .by = Species,
    ard_continuous(variables = Sepal.Width),
    ard_categorical(variables = Sepal.Length.Cat),
    .missing = TRUE,
    .attributes = TRUE
  ) |> 
  # create ARD for row percentages
  bind_ard(
    ard_categorical(iris2, by = Species, variables = Sepal.Length.Cat, statistic = ~"p", denominator = "row") |> 
      dplyr::mutate(stat_name = ifelse(stat_name == "p", "p_row", stat_name))
  )


# pass the ARD to gtsummary to create table
ard |> 
  tbl_ard_summary(
    by = Species,
    include = c(Sepal.Length.Cat, Sepal.Width),
    type = list(
      Sepal.Width ~ "continuous2"
    ),
    statistic = list(
      all_continuous2() ~ c("{mean} ({sd})","{min} - {max}"),
      all_categorical() ~ "{n} (Column {p}%; Row {p_row}%)"
    ),
    missing_text = "Missing"
  ) |> 
  modify_footnote(all_stat_cols() ~ NA) |> 
  as_kable() # convert to kable to display on SO
Characteristic setosa versicolor virginica
Sepal.Length.Cat
Big 22 (Column 44.0%; Row 18.6%) 47 (Column 94.0%; Row 39.8%) 49 (Column 98.0%; Row 41.5%)
Small 28 (Column 56.0%; Row 87.5%) 3 (Column 6.0%; Row 9.4%) 1 (Column 2.0%; Row 3.1%)
Sepal.Width
Mean (SD) 3.4 (0.4) 2.8 (0.3) 3.0 (0.3)
Min - Max 2.3 - 4.4 2.0 - 3.4 2.2 - 3.8

Created on 2024-08-27 with reprex v2.1.1

Upvotes: 3

Related Questions