NewBee
NewBee

Reputation: 1040

tbl_summary and numeric variables

tbl_summary [library (gtsummary)] does not treat all numeric variables in the same way and I can't figure out how to change it. For example.

mtcars only has numeric variables, so when I run this, I expect the means of every variable to be calcuated. Instead, it treats cyl, gear and carb as categorical.

tbl_summary(mtcars, statistic = list(all_numeric() ~ "{mean} ({sd})",
                                      all_categorical() ~ "{n} / {N} ({p}%)"))

I actually have a much bigger dataset and tbl_summary is treating some of the numeric variables as categorical. Would it be because there are such few N's (let's say I have a lot of missing rows) and tbl_summary does not want to calculate the mean for such a small N?

I can't wrap my mind around this!

Just a further example from my data. Q12_5_TEXT is a numeric variable, but this is the output from tbl_summary.

enter image description here

Upvotes: 3

Views: 3398

Answers (3)

GClarke
GClarke

Reputation: 35

I had this same issue and I fixed it by telling tbl_summary that the categorical variables are in fact continuous. Try:

df %>% 
  tbl_summary(
    by = b,
    type = list(all_continuous() ~ "continuous2",
                          all_categorical() ~ "continuous2"),
    statistic = all_continuous() ~ "{mean} ({sd})"
  )

Upvotes: 2

James Cutler
James Cutler

Reputation: 81

I tried type = all_continuous() ~ "continuous2", and I have version 1.3.5, and it didn't change the summary type:

library(tidyverse)
library(gtsummary)

nrows <- 30

df <- tibble(
  a = sample(c(0,1,3.5,7.5),nrows,replace = T),
  b = sample(c("Group I","Group II"),nrows,replace = T)
)

df %>% 
  tbl_summary(
    by = b,
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ "{mean} ({sd})"
  )

The output from this summarized variable 'a' as if it was a categorical variable in spite of the type argument. I also ran into this issue which is why I came here for the answer. If there is a different argument that I should be using that you could point me to, I would greatly appreciate it!

Upvotes: 3

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11680

Variables with few unique levels are summarized categorically. For example, mtcars$cyl only has three unique levels: 4, 6, 8. With only three levels, a categorical summary is more appropriate than a mean or median.

Use the type= argument to change the default summary type.

Upvotes: 4

Related Questions