Reputation: 1040
tbl_summary [library (gtsummary)] does not treat all numeric variables in the same way and I can't figure out how to change it. For example.
mtcars only has numeric variables, so when I run this, I expect the means of every variable to be calcuated. Instead, it treats cyl, gear and carb as categorical.
tbl_summary(mtcars, statistic = list(all_numeric() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"))
I actually have a much bigger dataset and tbl_summary is treating some of the numeric variables as categorical. Would it be because there are such few N's (let's say I have a lot of missing rows) and tbl_summary does not want to calculate the mean for such a small N?
I can't wrap my mind around this!
Just a further example from my data. Q12_5_TEXT is a numeric variable, but this is the output from tbl_summary.
Upvotes: 3
Views: 3398
Reputation: 35
I had this same issue and I fixed it by telling tbl_summary that the categorical variables are in fact continuous. Try:
df %>%
tbl_summary(
by = b,
type = list(all_continuous() ~ "continuous2",
all_categorical() ~ "continuous2"),
statistic = all_continuous() ~ "{mean} ({sd})"
)
Upvotes: 2
Reputation: 81
I tried type = all_continuous() ~ "continuous2", and I have version 1.3.5, and it didn't change the summary type:
library(tidyverse)
library(gtsummary)
nrows <- 30
df <- tibble(
a = sample(c(0,1,3.5,7.5),nrows,replace = T),
b = sample(c("Group I","Group II"),nrows,replace = T)
)
df %>%
tbl_summary(
by = b,
type = all_continuous() ~ "continuous2",
statistic = all_continuous() ~ "{mean} ({sd})"
)
The output from this summarized variable 'a' as if it was a categorical variable in spite of the type argument. I also ran into this issue which is why I came here for the answer. If there is a different argument that I should be using that you could point me to, I would greatly appreciate it!
Upvotes: 3
Reputation: 11680
Variables with few unique levels are summarized categorically. For example, mtcars$cyl
only has three unique levels: 4, 6, 8. With only three levels, a categorical summary is more appropriate than a mean or median.
Use the type=
argument to change the default summary type.
Upvotes: 4