ThePyGuy
ThePyGuy

Reputation: 18426

How to stack multiple columns of data.table into a single one?

Let's consider following snippet which has a data.table , and I'm calculating some aggregates, and I've added the sample output as well:

data <- data.table(col=sample(c("col1", "col2", "col3"), 20, replace=TRUE),
          id=as.character(sample(1:5, 20, replace=TRUE)),
          sex=sample(c("m", "f"), 20, replace=TRUE),
          val=sample.int(100, 20)
           )
res <- data[,.(n=.N,  mean=mean(val), total=sum(val)), by=.(col, id, sex)]

# res

A data.table: 13 × 6
col     id      sex     n       mean        total
<chr>   <chr>   <chr>   <int>   <dbl>       <int>
col2    5       f       3       39.33333    118
col1    3       f       2       18.50000    37
col3    5       f       1       7.00000     7
col2    4       f       1       46.00000    46
col1    2       f       2       29.00000    58
col1    5       f       2       37.00000    74
col2    1       m       1       37.00000    37
col1    3       m       2       29.50000    59
col1    4       m       1       26.00000    26
col2    4       m       2       54.00000    108
col3    5       m       1       60.00000    60
col2    2       f       1       70.00000    70
col2    5       m       1       29.00000    29

After calculating the aggregate, I'm pivoting the data.table on col column using dcast which results something like below:

dcast(res, id+sex~col, value.var=c("n", "mean", "total"))

#output:
A data.table: 9 × 11
id      sex     n_col1  n_col2  n_col3  mean_col1   mean_col2   mean_col3   total_col1  total_col2  total_col3
<chr>   <chr>   <int>   <int>   <int>   <dbl>       <dbl>       <dbl>       <int>       <int>       <int>
1       f       NA      1       1       NA          31          32.0        NA          31          32
1       m       1       1       NA      25          12          NA          25          12          NA
2       f       2       1       NA      48          51          NA          96          51          NA
2       m       1       2       NA      96          80          NA          96          160         NA
3       m       NA      NA      2       NA          NA          40.5        NA          NA          81
4       f       2       1       NA      43          22          NA          86          22          NA
4       m       NA      2       NA      NA          43          NA          NA          86          NA
5       f       1       1       NA      19          85          NA          19          85          NA
5       m       1       NA      NA      23          NA          NA          23          NA          NA

Now, the question is, how can we stack id, and sex columns together to create a single column, with additional level column which tells the level for the row along with NA values for outer column rows (indicated by <- below), following is the output I'm looking for:

newCol  level   n_col1  n_col2  n_col3  mean_col1   mean_col2   mean_col3   total_col1  total_col2  total_col3
<chr>   <int>   <int>   <int>   <int>   <dbl>       <dbl>       <dbl>       <int>       <int>       <int>
1       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       NA      1       1       NA          31          32.0        NA          31          32
m       2       1       1       NA      25          12          NA          25          12          NA
2       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       2       1       NA      48          51          NA          96          51          NA
m       2       1       2       NA      96          80          NA          96          160         NA
3       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
m       2       NA      NA      2       NA          NA          40.5        NA          NA          81
4       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       2       1       NA      43          22          NA          86          22          NA
m       2       NA      2       NA      NA          43          NA          NA          86          NA
5       1       NA      NA      NA      NA          NA          NA          NA          NA          NA <-
f       2       1       1       NA      19          85          NA          19          85          NA
m       2       1       NA      NA      23          NA          NA          23          NA          NA

Ideally, there can be more columns to stack and level can go up to 5. for two levels 1, and 2, NA rows are required only for level 1, but for 4 levels (meaning 4 stacking columns), NA rows are required for level 1, 2, and 3.

The straight forward appraoch is to iterate the unique values, and keep joining the dataframes which is ofcourse not going to be an efficient approach. You can find the Python pandas based question and answer for the above problem at: Pandas stack multiple columns to a single column

Upvotes: 2

Views: 341

Answers (1)

Wimpel
Wimpel

Reputation: 27732

# store your dcast result
res.wide <- dcast(res, id+sex~col, value.var=c("n", "mean", "total"))
# split by id
L <- split(res.wide, by = "id", keep.by = FALSE)
# rbind empy row
rbindlist( 
  lapply(seq.int(L), 
         function(x) rbind(data.table(sex = names(L)[[x]]),
                           L[[x]], 
                           fill = TRUE))
)
#    sex n_col1 n_col2 n_col3 mean_col1 mean_col2 mean_col3 total_col1 total_col2 total_col3
# 1:   1     NA     NA     NA        NA        NA        NA         NA         NA         NA
# 2:   f      1     NA      1  23.00000        NA  64.00000         23         NA         64
# 3:   m      3     NA     NA  87.33333        NA        NA        262         NA         NA
# 4:   2     NA     NA     NA        NA        NA        NA         NA         NA         NA
# 5:   f     NA      1      1        NA        87  31.00000         NA         87         31
# 6:   m      1     NA     NA  79.00000        NA        NA         79         NA         NA
# 7:   3     NA     NA     NA        NA        NA        NA         NA         NA         NA
# 8:   f     NA      2     NA        NA        23        NA         NA         46         NA
# 9:   m      1     NA     NA  25.00000        NA        NA         25         NA         NA
#10:   4     NA     NA     NA        NA        NA        NA         NA         NA         NA
#11:   f     NA     NA      1        NA        NA  52.00000         NA         NA         52
#12:   m      1      1     NA  30.00000        37        NA         30         37         NA
#13:   5     NA     NA     NA        NA        NA        NA         NA         NA         NA
#14:   f     NA      2      3        NA        31  63.66667         NA         62        191
#15:   m     NA     NA      1        NA        NA  12.00000         NA         NA         12

       

Upvotes: 4

Related Questions