The tbl_summary function does not appear to handle filtered data properly

Question

My data contains a column called 'Disease_subtype ' and the entries in the column is one of the following numbers: 0, 1, 2, 3, 4. I filtered out entries of 0 and 4 and tried to show summary of entries of 1, 2, and 3 only. It did not work as I intended and the 0 and 4 entries showed up as NA in the summary.

enter image description here

I wanted to show summary table of entries of 1, 2, and 3 using the following code:
table0 <-
df.wide.paired.filtered %>% 
    filter(Disease_subtype == 1 | Disease_subtype == 2 | Disease_subtype == 3) %>%
    tbl_summary(include = c(Sex, Age, Disease_subtype, EDSS, MS_Tx, T25FW, HPT, domHPT, nondomHPT, 
                            LCVA, SDMT, PASAT),
              by = Disease_subtype) %>%
    add_p() %>% 
    bold_labels()
table0

The result

Daniel D. Sjoberg · Accepted Answer

Your by variable, by = Disease_subtype, is a factor, and the tbl_summary() function purposefully shows all levels of a factor level. If you do not want them to appear in the table, you can add mutate(Disease_subtype = factor(Disease_subtype)) after your filter() and this will remove unobserved factor levels.

The tbl_summary function does not appear to handle filtered data properly

Answers (2)

Related Questions