lever
lever

Reputation: 694

rolling cumulative sums conditional on missing data

I want to calculate rolling cumulative sums by item in a data.table. Sometimes, data is missing for a given time period.

set.seed(8)
item <- c(rep("A",4), rep("B",3))
time <- c(1,2,3,4,1,3,4)
sales <- rpois(7,5)
DT <-  data.table(item, time,sales)

For a rolling window of 2 time periods I want the following output:

   item time sales sales_rolling2
1:    A    1     5              5
2:    A    2     3              8
3:    A    3     7             10
4:    A    4     6             13
5:    B    1     4              4
6:    B    3     6              6
7:    B    4     4             10

Note, that item B has no data at time 2. Thus the result for row 6 just includes the latest observation.

Upvotes: 1

Views: 475

Answers (1)

akrun
akrun

Reputation: 887881

We can use rollsum from library(zoo) to do the rolling sum. Before applying the rollsum, I guess we need to create another grouping variable ('indx') based on the 'time' variable. I find that for the item 'B', the time is not continous, ie. 2 is missing. So, we can use diff to create a logical index based on the difference of adjacent elements. If the difference is not 1, it will return TRUE or else FALSE. As the diff output is of length 1 less than the length of the column, we can pad with TRUE and then do the cumsum to create the 'indx' variable.

library(zoo)
DT[, indx:=cumsum(c(TRUE, diff(time)!=1))]

In the second step, we use both 'indx' and 'time' as the grouping variable, get the rollsum of 'sales' with k=2 and also based on the condition that if the number of elements in the group is greater than 1 only we need to do this (if(.N >1)), otherwise it should return the 'sales', create the 'sales_rolling2', and assign (:=) the 'indx' to NULL as it is not needed in the expected output.

DT[, sales_rolling2 := if(.N>1) c(sales[1],rollsum(sales,2)) else sales,
               by = .(indx, item)][,indx:= NULL]
#   item time sales sales_rolling2
#1:    A    1     5              5
#2:    A    2     3              8
#3:    A    3     7             10
#4:    A    4     6             13
#5:    B    1     4              4
#6:    B    3     6              6
#7:    B    4     4             10

Update

As per @Khashaa's suggestion, we can use roll_sum from library(RcppRoll) can be used more effectively as it will even work with number of rows less than 'k'. In this way, we can remove the if/else condition in my previous solution. (Full credit to @Khashaa)

library(RcppRoll)
DT[, sales_rolling2 := c(sales[1L], roll_sum(sales, 2)), by = .(indx, item)]

Upvotes: 3

Related Questions