Reputation: 339
Trying to get (reverse) cumulative sums in a moving window by group in data.table. For example, from the following data I'd like to get those values in the "roll_cumsum" column:
dt = data.table()
dt[, a := seq(1, 10, 1)]
dt[, group := rep(1:2, each = 5)]
dt[, roll_cumsum := c(15, 14, 12, 9, 5, 40, 34, 27, 19, 10)]
I got the results I wanted with the code below but it's quite slow for a large dataset:
partial_sum = function(x) { n <- seq_along(x); cumsum(x)[length(x)] - cumsum(x)[n] + x[n] }
dt[, partial_sum(a), by = group]
Any suggestions to make the calculation faster? Thank you so much!
Upvotes: 4
Views: 646
Reputation: 887108
There is a revcumsum
function
library(spatstat.utils)
dt[, roll_cumsum2 := revcumsum(a), group]
-output
dt
# a group roll_cumsum roll_cumsum2
# 1: 1 1 15 15
# 2: 2 1 14 14
# 3: 3 1 12 12
# 4: 4 1 9 9
# 5: 5 1 5 5
# 6: 6 2 40 40
# 7: 7 2 34 34
# 8: 8 2 27 27
# 9: 9 2 19 19
#10: 10 2 10 10
Or just do the rev
erse
dt[, roll_cumsum2 := rev(cumsum(rev(a))), group]
-output
dt
# a group roll_cumsum roll_cumsum2
# 1: 1 1 15 15
# 2: 2 1 14 14
# 3: 3 1 12 12
# 4: 4 1 9 9
# 5: 5 1 5 5
# 6: 6 2 40 40
# 7: 7 2 34 34
# 8: 8 2 27 27
# 9: 9 2 19 19
#10: 10 2 10 10
Or another way is
dt[, roll_cumsum2 := cumsum(a[.N:1])[.N:1], group]
NOTE: Both are compact versions
dt1 <- data.table(a = 1:1e7, group = rep(1:1e6, length.out = 1e7, 10))
system.time(dt1[, roll_cumsum := partial_sum(a), by = group])
#user system elapsed
# 2.073 0.037 2.094
system.time(dt1[, roll_cumsum2 := revcumsum(a), group])
#user system elapsed
# 2.623 0.029 2.637
system.time(dt1[, roll_cumsum3 := rev(cumsum(rev(a))), group])
#user system elapsed
# 4.275 0.051 4.276
system.time(dt1[, roll_cumsum4 := cumsum(a[.N:1])[.N:1], group])
#user system elapsed
# 1.703 0.028 1.722
system.time(dt1[, roll_cumsum5 := sum(a) - cumsum(shift(a, fill = 0)), group])
# user system elapsed
# 10.095 0.041 10.129
Upvotes: 2
Reputation: 388982
You can subtract cumulative sum of a
from sum(a)
in each group.
library(data.table)
dt[, roll_cumsum1 := sum(a) - cumsum(shift(a, fill = 0)), group]
dt
# a group roll_cumsum roll_cumsum1
# 1: 1 1 15 15
# 2: 2 1 14 14
# 3: 3 1 12 12
# 4: 4 1 9 9
# 5: 5 1 5 5
# 6: 6 2 40 40
# 7: 7 2 34 34
# 8: 8 2 27 27
# 9: 9 2 19 19
#10: 10 2 10 10
Upvotes: 2