Dinesh
Dinesh

Reputation: 285

Computing rolling mean in data.table with adaptive window lengths

I am looking to compute a moving average by group in a data.table with an adaptive window so that there are no NAs at the beginning of the time series. I know how to do this with frollmean and setting adaptive = TRUE (see for instance jangorecki's response in this thread). I can get the same code to work when all groups in my data.table are of the same length but run into errors when the groups are of different size.

For example, if my data is

tmp = data.table(Gp = c(rep('A',6),rep('B',4)), Val = c(1,3,4,6,2,2,8,5,7,10))

and I am doing a moving average of length 3, then the desired response is

> desired_output
    Gp  Val
 1:  A 1.00
 2:  A 2.00
 3:  A 2.67
 4:  A 4.33
 5:  A 4.00
 6:  A 3.33
 7:  B 8.00
 8:  B 6.50
 9:  B 6.67
10:  B 7.33

I tried the following:

mov_window_len = vector("list",2)
mov_window_len[[1]] = c(1,2,rep(3,4))
mov_window_len[[2]] = c(1,2,rep(3,2))
tmp[,lapply(.SD, frollmean, n = mov_window_len, align = "right", adaptive = TRUE), by = Gp]

but I get an error saying length of integer vector(s) provided as list to 'n' argument must be equal to number of observations provided in 'x'

Any help in resolving this will be much appreciated. Thanks in advance.

Upvotes: 3

Views: 445

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28675

You can use the group index .GRP to subset mov_window_len. This will give you the right lengths for each group. You only want to take frollmean of Val, so no need for lapply.

tmp[, frollmean(Val, n = mov_window_len[.GRP], align = "right", adaptive = TRUE), by = Gp]

#     Gp       V1
#  1:  A 1.000000
#  2:  A 2.000000
#  3:  A 2.666667
#  4:  A 4.333333
#  5:  A 4.000000
#  6:  A 3.333333
#  7:  B 8.000000
#  8:  B 6.500000
#  9:  B 6.666667
# 10:  B 7.333333

Alternatively window length can be added to input data.table (Len field below), as it corresponds to each row.

tmp[Gp=="A", Len:=mov_window_len[[1]]
    ][Gp=="B", Len:=mov_window_len[[2]]
     ][, .(Val, Len, RollVal=frollmean(Val, Len, adaptive=TRUE)), by=Gp]
#    Gp Val Len  RollVal
# 1:  A   1   1 1.000000
# 2:  A   3   2 2.000000
# 3:  A   4   3 2.666667
# 4:  A   6   3 4.333333
# 5:  A   2   3 4.000000
# 6:  A   2   3 3.333333
# 7:  B   8   1 8.000000
# 8:  B   5   2 6.500000
# 9:  B   7   3 6.666667
#10:  B  10   3 7.333333

Upvotes: 3

Related Questions