Reputation: 180
I was creating a summary for an article and I came out with the following behaviour that I cannot understand. two columns of the data frame report the min and max pressure as the following
a <- c(80, 80, 80, 80, 80, 80, 80, 80, 70, 70, 75, 75, 70, 65, 60, 80, 75, 70, 80, 80, 80, 80, 80, 70, 80, 70, 75, 80, 70, 65, 70, 75, 70, 75, 80, 65, 85, 75, 70, 70, 70, 75, 80, 80, 70, 70, 80, 70, 80, 60, 80, 80, 70, 70, 85, 70, 70, 80, 70, 70, 75, 75, 70, 70, 70,
70, 70, 80, 80, 70)
b <- c(120, 120, 120, 120, 120, 120, 120, 120, 120, 125, 120, 135, 130, 120, 115, 110, 125, 120, 130, 125, 110, 120, 130, 110, 125, 130, 105, 100, 110, 110, 130, 120, 110, 120, 135, 125, 145, 135, 130, 110, 115, 145, 120, 125, 100, 120, 120, 130,
115, 120, 110, 160, 120, 130, 155, 125, 135, 155, 110, 130, 145, 155, 125, 130, 140, 110, 100, 150, 130, 130)
pressure <- data.frame(a,b)
str(pressure)
pressure %>% tbl_summary()
and the result is the following
so for b I got the expected behaviour while is formatted as categorical I guess. No matter what change I made (forcing as double, adding decimals etc) worked to have a formatted as b. If I shorten the vectors the behaviour is different and they both looks like. I've also forced the output with
pressure %>% tbl_summary(statistic = list(all_continuous() ~ "{mean} ({sd})"))
but I keep getting same results Any help appreciated
Upvotes: 1
Views: 227
Reputation: 3901
It appears to be the default behavior of tbl_summary()
to interpret any numeric variables with fewer than 10 unique values as categorical. You can observe that when running the following:
library(tidyverse)
library(gtsummary)
d <- map_dfc(8:12, \(x) rep(1:x, length.out = 100)) |>
set_names(letters[1:5])
d |>
tbl_summary()
This behavior can be overridden by specifying the type of the problematic variables:
d |>
tbl_summary(type = list(c(a,b,c) ~ "continuous"))
Upvotes: 2