summarize data.table .SD, while sorting by one of .SDcols

Question

How do I calculate a cumulative sum of .SD columns while sorting by the first of .SDcols (without summing this first column)

For a toy example:

require(data.table)
dt<-data.table(expand.grid(dow = 1:5, person = c("alice", "bob")))
setcolorder(dt, c("person", "dow"))
dt[, `:=`(current1 = runif(nrow(dt)), current2 = runif(nrow(dt)))]

The data prior to summation:

    person dow current1 current2
1:  alice   1    0.266   0.9447
2:  alice   2    0.372   0.6608
3:  alice   3    0.573   0.6291
4:    bob   1    0.908   0.0618
5:    bob   2    0.202   0.2060
6:    bob   3    0.898   0.1766

dt[order(dow), lapply(.SD, cumsum), by = c("person")] yields the desired cumulative sums of current1 and current2:

    person dow current1 current2
1:  alice   1    0.266   0.9447
2:  alice   3    0.638   1.6055
3:  alice   6    1.210   2.2346
4:    bob   1    0.908   0.0618
5:    bob   3    1.110   0.2678
6:    bob   6    2.008   0.4443

but also, undesirably, sums dow.

dt[order(dow), lapply(.SD, cumsum), by = c("person"), .SDcols = c("current1", "current2")] drops the dow col. altogher.

The desired output should have the dow column ordered but not summed.

akrun · Accepted Answer

We can specify the columns of interest in .SDcols and update the other columns by assignment operator := to assign the output values in place

dt[order(dow), c('current1', 'current2') := lapply(.SD, cumsum),
      by = .(person), .SDcols = current1:current2]

NOTE: The function used is cumsum, so it will generate the same number of rows as in the original dataset

summarize data.table .SD, while sorting by one of .SDcols

Answers (1)

Related Questions