Elliott Chinn
Elliott Chinn

Reputation: 207

How to add percentage to "unknown" in gtsummary

I have a continuous variable with a significant proportion of unknowns. My advisor is asking me to put the percentage next to it in the column. This reprex mimics what I am trying to do.

library(tidyverse)
library(gtsummary)

  trial %>%       # included with gtsummary package
  select(trt, age, grade) %>%
  tbl_summary()

I am trying to have the percentage of unknowns listed next to unknown, ideally in parentheses. It would look like 11 (5.5%).

Some have replied with a request for how the missing data appears in my dataset, here is a reprex of that

library(gtsummary)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3
#> Warning: package 'readr' was built under R version 4.0.3
library(gtsummary)

df<-
  tibble::tribble(
               ~age,       ~sex,  ~race,          ~weight,
  70, "male",  "white",       50,
  57, "female", "african-american",   87,
  64,  "male",  "white",              NA,
  46,  "male",  "white", 49,
  87,  "male",  "hispanic", 51
  )

df %>%
  select(age,sex,race,weight) %>%
  tbl_summary(type = list(age ~ "continuous", weight ~ "continuous"), missing="ifany")

Upvotes: 7

Views: 4793

Answers (1)

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11680

There are a few ways to report the missing rate. I'll illustrate a few below and you may pick the best solution for you.

  1. Categorical variables: I recommend you make the missing values explicit factor levels before passing the data frame to tbl_summary(). The NA values will no longer be missing, and will be counted in like any other level of the variable.
  2. Continuous variables: Use the statistic= argument to report the rate of missingness.
  3. All variables: Use add_n() to report rate of missingness
library(gtsummary)

trial %>%      
  select(age, response, trt) %>%
  # making the NA value explicit level of factor with `forcats::fct_explicit_na()`
  dplyr::mutate(response = factor(response) %>% forcats::fct_explicit_na()) %>%
  tbl_summary(
    by = trt,
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ c("{N_nonmiss}/{N_obs} {p_nonmiss}%",
                                     "{median} ({p25}, {p75})")
  ) %>%
  add_n(statistic = "{n} / {N}")

enter image description here

EDIT: Adding more example after comments from original poster.

library(gtsummary)

trial %>%      
  select(age, response, trt) %>%
  # making the NA value explicit level of factor with `forcats::fct_explicit_na()`
  dplyr::mutate(response = factor(response) %>% forcats::fct_explicit_na(na_level = "Unknown")) %>%
  tbl_summary(
    by = trt,
    type = all_continuous() ~ "continuous2",
    missing = "no",
    statistic = all_continuous() ~ c("{median} ({p25}, {p75})",
                                     "{N_miss} ({p_miss}%)")
  ) %>%
  # udpating the Unknown label in the `.$table_body`
  modify_table_body(
    dplyr::mutate,
    label = ifelse(label == "N missing (% missing)",
                   "Unknown",
                   label)
  )

enter image description here

Upvotes: 12

Related Questions