user5316628
user5316628

Reputation: 357

Rolling sum in specified range

For df I want to take the rolling sum of the Value column over the last 10 seconds, with Time given in seconds. The dataframe is very large so using dply::complete is not an option (millions of data point, millisecond level). I prefer dplyr solution but think it may be possible with datatable left_join, just cant make it work.

df = data.frame(Row=c(1,2,3,4,5,6,7),Value=c(4,7,2,6,3,8,3),Time=c(10021,10023,10027,10035,10055,10058,10092))

Solution would add a column (Sum.10S) that takes the rolling sum of past 10 seconds:

df$Sum.10S=c(4,11,13,8,3,11,3)

Upvotes: 0

Views: 232

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270248

Define a function sum10 which sums the last 10 seconds and use it with rollapplyr. It avoids explicit looping and runs about 10x faster than explicit looping using the data in the question.

library(zoo)

sum10 <- function(x) {
  if (is.null(dim(x))) x <- t(x)
  tt <- x[, "Time"]
  sum(x[tt >= tail(tt, 1) - 10, "Value"])
}

transform(df, S10 = rollapplyr(df, 10, sum10, by.column = FALSE, partial = TRUE))

giving:

  Row Value  Time  S10
1   1     4 10021    4
2   2     7 10023   11
3   3     2 10027   13
4   4     6 10035    8
5   5     3 10055    3
6   6     8 10058   11
7   7     3 10092    3

Upvotes: 2

SeldomSeenSlim
SeldomSeenSlim

Reputation: 841

Well I wasn't fast enough to get the first answer in. But this solution is simpler, and doesn't require an external library.

df = data.frame(Row=c(1,2,3,4,5,6,7),Value=c(4,7,2,6,3,8,3),Time=c(10021,10023,10027,10035,10055,10058,10092))

df$SumR<-NA
for(i in 1:nrow(df)){
  df$SumR[i]<-sum(df$Value[which(df$Time<=df$Time[i] & df$Time>=df$Time[i]-10)])
  }

  Row Value  Time SumR
1   1     4 10021    4
2   2     7 10023   11
3   3     2 10027   13
4   4     6 10035    8
5   5     3 10055    3
6   6     8 10058   11
7   7     3 10092    3

Upvotes: 1

Related Questions