Reputation: 85
Using package gtsummary version 2.0.0 under R version 4.4.0 (Windows), I have a question concerning the order of the stratifying variable, which seems to get mixed up sometimes.
example:
library(gtsummary)
test <- data.frame(x = 1:5, y = as.character(2010:2024))
tbl_summary(test, by = y) # weird column order (2010/2019/2020:2024/2011:2018)
tbl_summary(subset(test, y >= 2016), by = y) # column order ok (2016:2024)
tbl_summary(subset(test, y >= 2015), by = y) # weird column order (2015/2024/2016:2023)
or:
tbl_summary(data.frame(x = 1, y = LETTERS[1:9]), by = y) # ok
tbl_summary(data.frame(x = 1, y = LETTERS[1:10]), by = y) # strange
So it seems to work as long as there are less than 10 categories in y, and gets mixed up when there are more?
Am I doing something wrong here, how can I get the order right when there are more columns?
Upvotes: 1
Views: 150
Reputation: 125418
This looks like a bug to me. The underlying issue is that the stat_xxx
columns are ordered alphabetically in the table body, which however only becomes an issue when there are 10 or more different categories, i.e. we have stat_1
, stat_10
, ... instead of stat_1
, stat_2
, ...:
library(gtsummary)
test <- data.frame(x = 1:5, y = as.character(2010:2024))
tbl <- test |>
tbl_summary(by = y)
tbl$table_body
#> # A tibble: 6 × 20
#> variable var_type var_label row_type label stat_1 stat_10 stat_11 stat_12
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 x categorical x label x <NA> <NA> <NA> <NA>
#> 2 x categorical x level 1 1 (100%) 0 (0%) 1 (100… 0 (0%)
#> 3 x categorical x level 2 0 (0%) 0 (0%) 0 (0%) 1 (100…
#> 4 x categorical x level 3 0 (0%) 0 (0%) 0 (0%) 0 (0%)
#> 5 x categorical x level 4 0 (0%) 0 (0%) 0 (0%) 0 (0%)
#> 6 x categorical x level 5 0 (0%) 1 (100… 0 (0%) 0 (0%)
#> # ℹ 11 more variables: stat_13 <chr>, stat_14 <chr>, stat_15 <chr>,
#> # stat_2 <chr>, stat_3 <chr>, stat_4 <chr>, stat_5 <chr>, stat_6 <chr>,
#> # stat_7 <chr>, stat_8 <chr>, stat_9 <chr>
As workaround you can use a small custom function to fix the order, something like:
tbl_fix_order <- function(x) {
nms <- names(x$table_body)
is_stat_cols <- grepl("^stat_", nms)
non_stat_cols <- nms[!is_stat_cols]
stat_cols <- nms[is_stat_cols]
stat_cols <- stat_cols[
order(as.integer(gsub("^.*?(\\d+)$", "\\1", stat_cols)))
]
x$table_body <- x$table_body[c(non_stat_cols, stat_cols)]
x
}
tbl |>
tbl_fix_order()
Upvotes: 2