Maxence Dum.
Maxence Dum.

Reputation: 121

Duplicate rows occuring with overlapping subtotals in expss tables

A new expss-related question, following a situation encountered earlier this day. Using overlapping subtotals during the crosstable creation sequence results in duplicates rows, and more precisely the rows used in at least two distinct subtotals.

Let's consider the infert dataset to highlight this matter. For the sake of demonstration, we will coerce parity to factor.

library(datasets)
infert$parity <- factor(infert$parity)

infert %>%
  tab_cells(parity) %>%
  tab_subtotal_cells("1+2+3"=levels(parity)[1:3], "1+2"=levels(parity)[1:2],
                     position = "above") %>%
  tab_cols(total()) %>%
  tab_stat_cases(label="N", total_row_position="none") %>%
  tab_pivot(stat_position="inside_columns")

The result is self-explanatory. Even though I understand why this happens (the need to compute each distinct total), I would like to know if there is a clever way to get rid of the duplicates.

And as a subsidiary question, since subtotals can be written/positionned in many different fashions, the output order can be quite messy. Is there a function to sort and/or move only specified rows? (which in fact would be the opposite of excluded_rows parameter from tab_sort functions) Ideally the output would be sorted like this:

1+2+3
1+2
1
2
3
4
5+6
5
6

Thank you!

Upvotes: 0

Views: 95

Answers (1)

Gregory Demin
Gregory Demin

Reputation: 4846

If your subtotals have overlapping items you can hide these items with surprisingly named function 'hide'. By now there is no function for custom positioning of the subtotals. However, we can get your desired output with a small trick:

data("infert")
infert$parity = factor(infert$parity)

infert %>%
    tab_cells(
            subtotal(parity, 
                     "1+2+3"=hide(levels(parity)[1:3]), 
                     "1+2"= levels(parity)[1:2],
                     "3" = hide(levels(parity)[3]), # to show 3, because "1+2+3" subtotal hide its items
                     "5+6"=levels(parity)[5:6], 
                     position = "above"
                     )
    ) %>%
    tab_cols(total()) %>%
    tab_stat_cases(label="N", total_row_position="none") %>%
    tab_pivot(stat_position="inside_columns")

Upvotes: 1

Related Questions