Reputation: 694
I want to calculate rolling cumulative sums by item in a data.table. Sometimes, data is missing for a given time period.
set.seed(8)
item <- c(rep("A",4), rep("B",3))
time <- c(1,2,3,4,1,3,4)
sales <- rpois(7,5)
DT <- data.table(item, time,sales)
For a rolling window of 2 time periods I want the following output:
item time sales sales_rolling2
1: A 1 5 5
2: A 2 3 8
3: A 3 7 10
4: A 4 6 13
5: B 1 4 4
6: B 3 6 6
7: B 4 4 10
Note, that item B has no data at time 2. Thus the result for row 6 just includes the latest observation.
Upvotes: 1
Views: 475
Reputation: 887881
We can use rollsum
from library(zoo)
to do the rolling sum. Before applying the rollsum
, I guess we need to create another grouping variable ('indx') based on the 'time' variable. I find that for the item 'B', the time is not continous, ie. 2 is missing. So, we can use diff
to create a logical index based on the difference of adjacent elements. If the difference is not 1, it will return TRUE or else FALSE. As the diff
output is of length
1 less than the length
of the column, we can pad with TRUE
and then do the cumsum
to create the 'indx' variable.
library(zoo)
DT[, indx:=cumsum(c(TRUE, diff(time)!=1))]
In the second step, we use both 'indx' and 'time' as the grouping variable, get the rollsum
of 'sales' with k=2
and also based on the condition that if the number of elements in the group is greater than 1 only we need to do this (if(.N >1)
), otherwise it should return the 'sales', create the 'sales_rolling2', and assign (:=
) the 'indx' to NULL as it is not needed in the expected output.
DT[, sales_rolling2 := if(.N>1) c(sales[1],rollsum(sales,2)) else sales,
by = .(indx, item)][,indx:= NULL]
# item time sales sales_rolling2
#1: A 1 5 5
#2: A 2 3 8
#3: A 3 7 10
#4: A 4 6 13
#5: B 1 4 4
#6: B 3 6 6
#7: B 4 4 10
As per @Khashaa's suggestion, we can use roll_sum
from library(RcppRoll)
can be used more effectively as it will even work with number of rows less than 'k'. In this way, we can remove the if/else
condition in my previous solution. (Full credit to @Khashaa)
library(RcppRoll)
DT[, sales_rolling2 := c(sales[1L], roll_sum(sales, 2)), by = .(indx, item)]
Upvotes: 3