Henrik
Henrik

Reputation: 85

gtsummary tbl_summary: sorting of stratify variable

Using package gtsummary version 2.0.0 under R version 4.4.0 (Windows), I have a question concerning the order of the stratifying variable, which seems to get mixed up sometimes.

example:

library(gtsummary)
test <- data.frame(x = 1:5, y = as.character(2010:2024))
tbl_summary(test, by = y) # weird column order (2010/2019/2020:2024/2011:2018)
tbl_summary(subset(test, y >= 2016), by = y) # column order ok (2016:2024)
tbl_summary(subset(test, y >= 2015), by = y) # weird column order (2015/2024/2016:2023)

or:

tbl_summary(data.frame(x = 1, y = LETTERS[1:9]), by = y) # ok
tbl_summary(data.frame(x = 1, y = LETTERS[1:10]), by = y) # strange

So it seems to work as long as there are less than 10 categories in y, and gets mixed up when there are more?

Am I doing something wrong here, how can I get the order right when there are more columns?

Upvotes: 1

Views: 150

Answers (1)

stefan
stefan

Reputation: 125418

This looks like a bug to me. The underlying issue is that the stat_xxx columns are ordered alphabetically in the table body, which however only becomes an issue when there are 10 or more different categories, i.e. we have stat_1, stat_10, ... instead of stat_1, stat_2, ...:

library(gtsummary)
test <- data.frame(x = 1:5, y = as.character(2010:2024))

tbl <- test |> 
  tbl_summary(by = y)

tbl$table_body
#> # A tibble: 6 × 20
#>   variable var_type    var_label row_type label stat_1   stat_10 stat_11 stat_12
#>   <chr>    <chr>       <chr>     <chr>    <chr> <chr>    <chr>   <chr>   <chr>  
#> 1 x        categorical x         label    x     <NA>     <NA>    <NA>    <NA>   
#> 2 x        categorical x         level    1     1 (100%) 0 (0%)  1 (100… 0 (0%) 
#> 3 x        categorical x         level    2     0 (0%)   0 (0%)  0 (0%)  1 (100…
#> 4 x        categorical x         level    3     0 (0%)   0 (0%)  0 (0%)  0 (0%) 
#> 5 x        categorical x         level    4     0 (0%)   0 (0%)  0 (0%)  0 (0%) 
#> 6 x        categorical x         level    5     0 (0%)   1 (100… 0 (0%)  0 (0%) 
#> # ℹ 11 more variables: stat_13 <chr>, stat_14 <chr>, stat_15 <chr>,
#> #   stat_2 <chr>, stat_3 <chr>, stat_4 <chr>, stat_5 <chr>, stat_6 <chr>,
#> #   stat_7 <chr>, stat_8 <chr>, stat_9 <chr> 

As workaround you can use a small custom function to fix the order, something like:

tbl_fix_order <- function(x) {
  nms <- names(x$table_body)
  is_stat_cols <- grepl("^stat_", nms)
  non_stat_cols <- nms[!is_stat_cols]
  stat_cols <- nms[is_stat_cols]
  stat_cols <- stat_cols[
    order(as.integer(gsub("^.*?(\\d+)$", "\\1", stat_cols)))
  ]
  x$table_body <- x$table_body[c(non_stat_cols, stat_cols)]
  
  x  
}

tbl |>
  tbl_fix_order()

enter image description here

Upvotes: 2

Related Questions