dzeltzer
dzeltzer

Reputation: 1000

summarize data.table .SD, while sorting by one of .SDcols

How do I calculate a cumulative sum of .SD columns while sorting by the first of .SDcols (without summing this first column)

For a toy example:

require(data.table)
dt<-data.table(expand.grid(dow = 1:5, person = c("alice", "bob")))
setcolorder(dt, c("person", "dow"))
dt[, `:=`(current1 = runif(nrow(dt)), current2 = runif(nrow(dt)))]

The data prior to summation:

    person dow current1 current2
1:  alice   1    0.266   0.9447
2:  alice   2    0.372   0.6608
3:  alice   3    0.573   0.6291
4:    bob   1    0.908   0.0618
5:    bob   2    0.202   0.2060
6:    bob   3    0.898   0.1766

dt[order(dow), lapply(.SD, cumsum), by = c("person")] yields the desired cumulative sums of current1 and current2:

    person dow current1 current2
1:  alice   1    0.266   0.9447
2:  alice   3    0.638   1.6055
3:  alice   6    1.210   2.2346
4:    bob   1    0.908   0.0618
5:    bob   3    1.110   0.2678
6:    bob   6    2.008   0.4443

but also, undesirably, sums dow.

dt[order(dow), lapply(.SD, cumsum), by = c("person"), .SDcols = c("current1", "current2")] drops the dow col. altogher.

The desired output should have the dow column ordered but not summed.

Upvotes: 1

Views: 142

Answers (1)

akrun
akrun

Reputation: 887951

We can specify the columns of interest in .SDcols and update the other columns by assignment operator := to assign the output values in place

dt[order(dow), c('current1', 'current2') := lapply(.SD, cumsum),
      by = .(person), .SDcols = current1:current2]

NOTE: The function used is cumsum, so it will generate the same number of rows as in the original dataset

Upvotes: 2

Related Questions