How to stack multiple columns of data.table into a single one?

Question

Let's consider following snippet which has a data.table , and I'm calculating some aggregates, and I've added the sample output as well:

data <- data.table(col=sample(c("col1", "col2", "col3"), 20, replace=TRUE),
          id=as.character(sample(1:5, 20, replace=TRUE)),
          sex=sample(c("m", "f"), 20, replace=TRUE),
          val=sample.int(100, 20)
           )
res <- data[,.(n=.N,  mean=mean(val), total=sum(val)), by=.(col, id, sex)]

# res

A data.table: 13 × 6
col     id      sex     n       mean        total
                   
col2    5       f       3       39.33333    118
col1    3       f       2       18.50000    37
col3    5       f       1       7.00000     7
col2    4       f       1       46.00000    46
col1    2       f       2       29.00000    58
col1    5       f       2       37.00000    74
col2    1       m       1       37.00000    37
col1    3       m       2       29.50000    59
col1    4       m       1       26.00000    26
col2    4       m       2       54.00000    108
col3    5       m       1       60.00000    60
col2    2       f       1       70.00000    70
col2    5       m       1       29.00000    29

After calculating the aggregate, I'm pivoting the data.table on col column using dcast which results something like below:

dcast(res, id+sex~col, value.var=c("n", "mean", "total"))

#output:
A data.table: 9 × 11
id      sex     n_col1  n_col2  n_col3  mean_col1   mean_col2   mean_col3   total_col1  total_col2  total_col3
                                                  
1       f       NA      1       1       NA          31          32.0        NA          31          32
1       m       1       1       NA      25          12          NA          25          12          NA
2       f       2       1       NA      48          51          NA          96          51          NA
2       m       1       2       NA      96          80          NA          96          160         NA
3       m       NA      NA      2       NA          NA          40.5        NA          NA          81
4       f       2       1       NA      43          22          NA          86          22          NA
4       m       NA      2       NA      NA          43          NA          NA          86          NA
5       f       1       1       NA      19          85          NA          19          85          NA
5       m       1       NA      NA      23          NA          NA          23          NA          NA

Now, the question is, how can we stack id, and sex columns together to create a single column, with additional level column which tells the level for the row along with NA values for outer column rows (indicated by <- below), following is the output I'm looking for:

newCol  level   n_col1  n_col2  n_col3  mean_col1   mean_col2   mean_col3   total_col1  total_col2  total_col3
                                                  
1       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       NA      1       1       NA          31          32.0        NA          31          32
m       2       1       1       NA      25          12          NA          25          12          NA
2       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       2       1       NA      48          51          NA          96          51          NA
m       2       1       2       NA      96          80          NA          96          160         NA
3       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
m       2       NA      NA      2       NA          NA          40.5        NA          NA          81
4       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       2       1       NA      43          22          NA          86          22          NA
m       2       NA      2       NA      NA          43          NA          NA          86          NA
5       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       1       1       NA      19          85          NA          19          85          NA
m       2       1       NA      NA      23          NA          NA          23          NA          NA

Ideally, there can be more columns to stack and level can go up to 5. for two levels 1, and 2, NA rows are required only for level 1, but for 4 levels (meaning 4 stacking columns), NA rows are required for level 1, 2, and 3.

The straight forward appraoch is to iterate the unique values, and keep joining the dataframes which is ofcourse not going to be an efficient approach. You can find the Python pandas based question and answer for the above problem at: Pandas stack multiple columns to a single column

How to stack multiple columns of data.table into a single one?

Answers (1)

Related Questions