Reputation: 1000
How do I calculate a cumulative sum of .SD columns while sorting by the first of .SDcols (without summing this first column)
For a toy example:
require(data.table)
dt<-data.table(expand.grid(dow = 1:5, person = c("alice", "bob")))
setcolorder(dt, c("person", "dow"))
dt[, `:=`(current1 = runif(nrow(dt)), current2 = runif(nrow(dt)))]
The data prior to summation:
person dow current1 current2
1: alice 1 0.266 0.9447
2: alice 2 0.372 0.6608
3: alice 3 0.573 0.6291
4: bob 1 0.908 0.0618
5: bob 2 0.202 0.2060
6: bob 3 0.898 0.1766
dt[order(dow), lapply(.SD, cumsum), by = c("person")]
yields the desired cumulative sums of current1
and current2
:
person dow current1 current2
1: alice 1 0.266 0.9447
2: alice 3 0.638 1.6055
3: alice 6 1.210 2.2346
4: bob 1 0.908 0.0618
5: bob 3 1.110 0.2678
6: bob 6 2.008 0.4443
but also, undesirably, sums dow
.
dt[order(dow), lapply(.SD, cumsum), by = c("person"), .SDcols = c("current1", "current2")]
drops the dow
col. altogher.
The desired output should have the dow column ordered but not summed.
Upvotes: 1
Views: 142
Reputation: 887951
We can specify the columns of interest in .SDcols
and update the other columns by assignment operator :=
to assign the output values in place
dt[order(dow), c('current1', 'current2') := lapply(.SD, cumsum),
by = .(person), .SDcols = current1:current2]
NOTE: The function used is cumsum
, so it will generate the same number of rows as in the original dataset
Upvotes: 2