Seon Choi
Seon Choi

Reputation: 13

The tbl_summary function does not appear to handle filtered data properly

My data contains a column called 'Disease_subtype ' and the entries in the column is one of the following numbers: 0, 1, 2, 3, 4. I filtered out entries of 0 and 4 and tried to show summary of entries of 1, 2, and 3 only. It did not work as I intended and the 0 and 4 entries showed up as NA in the summary.

enter image description here

I wanted to show summary table of entries of 1, 2, and 3 using the following code:
table0 <-
df.wide.paired.filtered %>% 
    filter(Disease_subtype == 1 | Disease_subtype == 2 | Disease_subtype == 3) %>%
    tbl_summary(include = c(Sex, Age, Disease_subtype, EDSS, MS_Tx, T25FW, HPT, domHPT, nondomHPT, 
                            LCVA, SDMT, PASAT),
              by = Disease_subtype) %>%
    add_p() %>% 
    bold_labels()
table0

The result

Upvotes: 1

Views: 396

Answers (2)

ZKA
ZKA

Reputation: 507

Just to bring up an alternative way to Daniel Sjoberg's, you can also use %>% droplevels()

EDIT:

df.wide.paired.filtered %>% 
    filter(Disease_subtype %in% 1:3) %>% 
    droplevels() %>%
    tbl_summary(include = c(Sex, Age, Disease_subtype, EDSS, MS_Tx, T25FW, HPT, domHPT, nondomHPT, 
                            LCVA, SDMT, PASAT),
              by = Disease_subtype) %>%
    add_p() %>% 
    bold_labels()

Upvotes: 1

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11680

Your by variable, by = Disease_subtype, is a factor, and the tbl_summary() function purposefully shows all levels of a factor level. If you do not want them to appear in the table, you can add mutate(Disease_subtype = factor(Disease_subtype)) after your filter() and this will remove unobserved factor levels.

Upvotes: 0

Related Questions