User878239
User878239

Reputation: 709

Moving average with changing window

I would like to calculate the moving average of a variable in R with a changing window size. To be more specific: The moving average should be calculated over three years, but the data (time-series) is available in higher frequency and the window size can vary for each three-year window.

Assume the following dataset:

library(data.table)
set.seed(1)   # reproduceable data
dataset <- data.table(ID=c(rep("A",2208),rep("B",2208)),
x = c(rnorm(2208*2)), time=c(seq(as.Date("1988/03/15"),
as.Date("2000/04/16"), "day"),seq(as.Date("1988/03/15"),
as.Date("2000/04/16"), "day")))

The three-year moving average of the variable x should be calculated for both of the IDs (person A and B). Can this be done with zoo and datatable preferably? But any solution is fine.

Please note I know how to do this with a fixed window size, the problem here is the varying window size.

Upvotes: 2

Views: 552

Answers (2)

Uwe
Uwe

Reputation: 42544

If I understand correctly the OP wants to span exactly 3 years. As this may include leap years, the window size can be either 1095 days or 1096 days.

This can be solved by aggregating in a non-equi join together with lubridate's rollback date arithmetic.

library(data.table)
library(lubridate)
# create 3 years windows for each ID for later non-equi join
win <- dataset[, CJ(ID = ID, start = time, unique = TRUE)][
  # make sure to pick
  , end := start %m+% years(3) - days(1)][
    # remove windows which end out of date range
    end <= max(start)]
win
      ID      start        end
   1:  A 1988-03-15 1991-03-14
   2:  A 1988-03-16 1991-03-15
   3:  A 1988-03-17 1991-03-16
   4:  A 1988-03-18 1991-03-17
   5:  A 1988-03-19 1991-03-18
  ---                         
6638:  B 1997-04-13 2000-04-12
6639:  B 1997-04-14 2000-04-13
6640:  B 1997-04-15 2000-04-14
6641:  B 1997-04-16 2000-04-15
6642:  B 1997-04-17 2000-04-16
# check window lengths
win[, .N, by = .(days = end - start + 1L)]
        days    N
1: 1095 days 2166
2: 1096 days 4476
# see what happens in leap years
win[leap_year(start) & month(start) == 2 & day(start) %in% 28:29, 
  .(start, end, days = end - start + 1L)]
        start        end      days
1: 1992-02-28 1995-02-27 1096 days
2: 1992-02-29 1995-02-27 1095 days
3: 1996-02-28 1999-02-27 1096 days
4: 1996-02-29 1999-02-27 1095 days
5: 1992-02-28 1995-02-27 1096 days
6: 1992-02-29 1995-02-27 1095 days
7: 1996-02-28 1999-02-27 1096 days
8: 1996-02-29 1999-02-27 1095 days
win[leap_year(end) & month(end) == 2 & day(end) %in% 28:29,
    .(start, end, days = end - start + 1L)]
        start        end      days
1: 1989-03-01 1992-02-29 1096 days
2: 1993-03-01 1996-02-29 1096 days
3: 1997-03-01 2000-02-29 1096 days
4: 1989-03-01 1992-02-29 1096 days
5: 1993-03-01 1996-02-29 1096 days
6: 1997-03-01 2000-02-29 1096 days
# aggregate in a non-equi-join
dataset[win, on = .(ID, time >= start, time <= end), by = .EACHI, .(avg = mean(x))]
      ID       time       time         avg
   1:  A 1988-03-15 1991-03-14 -0.01184078
   2:  A 1988-03-16 1991-03-15 -0.01317813
   3:  A 1988-03-17 1991-03-16 -0.01179571
   4:  A 1988-03-18 1991-03-17 -0.01006100
   5:  A 1988-03-19 1991-03-18 -0.01221798
  ---                                     
6638:  B 1997-04-13 2000-04-12 -0.03412214
6639:  B 1997-04-14 2000-04-13 -0.03604176
6640:  B 1997-04-15 2000-04-14 -0.03556291
6641:  B 1997-04-16 2000-04-15 -0.03392185
6642:  B 1997-04-17 2000-04-16 -0.03393674

Upvotes: 4

moodymudskipper
moodymudskipper

Reputation: 47300

As @A.Suliman mentions in comments your example data has a fixed window width, but let's assume your real data hasn't as that's what your text says.

The parameter width from rollapply doesn't have to be constant, so we can compute the width first, make sure we align the window to the left, and run rollaply.

library(zoo)
library(tidyverse)
dataset %>% 
  arrange(ID,time) %>%
  group_by(ID) %>%
  mutate(avg = rollapply(x, FUN = mean, align = "left",
                         width = map_dbl(time, ~which.max(time[time < .x + 3*365.25])) - row_number()+1))
#                          
# # A tibble: 8,832 x 4
# # Groups:   ID [2]
#         ID       x       time    avg
#     <fctr>   <dbl>     <date>  <dbl>
#   1      A -0.0258 1988-03-15 0.0109
#   2      A -0.0258 1988-03-15 0.0109
#   3      A -0.1562 1988-03-16 0.0115
#   4      A -0.1562 1988-03-16 0.0115
#   5      A  0.8193 1988-03-17 0.0115
#   6      A  0.8193 1988-03-17 0.0112
#   7      A -1.1136 1988-03-18 0.0102
#   8      A -1.1136 1988-03-18 0.0107
#   9      A -0.9336 1988-03-19 0.0105
#  10      A -0.9336 1988-03-19 0.0109

Upvotes: 2

Related Questions