Reputation: 709
I would like to calculate the moving average of a variable in R with a changing window size. To be more specific: The moving average should be calculated over three years, but the data (time-series) is available in higher frequency and the window size can vary for each three-year window.
Assume the following dataset:
library(data.table)
set.seed(1) # reproduceable data
dataset <- data.table(ID=c(rep("A",2208),rep("B",2208)),
x = c(rnorm(2208*2)), time=c(seq(as.Date("1988/03/15"),
as.Date("2000/04/16"), "day"),seq(as.Date("1988/03/15"),
as.Date("2000/04/16"), "day")))
The three-year moving average of the variable x should be calculated for both of the IDs (person A and B). Can this be done with zoo
and datatable
preferably? But any solution is fine.
Please note I know how to do this with a fixed window size, the problem here is the varying window size.
Upvotes: 2
Views: 552
Reputation: 42544
If I understand correctly the OP wants to span exactly 3 years. As this may include leap years, the window size can be either 1095 days or 1096 days.
This can be solved by aggregating in a non-equi join together with lubridate
's rollback date arithmetic.
library(data.table)
library(lubridate)
# create 3 years windows for each ID for later non-equi join
win <- dataset[, CJ(ID = ID, start = time, unique = TRUE)][
# make sure to pick
, end := start %m+% years(3) - days(1)][
# remove windows which end out of date range
end <= max(start)]
win
ID start end 1: A 1988-03-15 1991-03-14 2: A 1988-03-16 1991-03-15 3: A 1988-03-17 1991-03-16 4: A 1988-03-18 1991-03-17 5: A 1988-03-19 1991-03-18 --- 6638: B 1997-04-13 2000-04-12 6639: B 1997-04-14 2000-04-13 6640: B 1997-04-15 2000-04-14 6641: B 1997-04-16 2000-04-15 6642: B 1997-04-17 2000-04-16
# check window lengths
win[, .N, by = .(days = end - start + 1L)]
days N 1: 1095 days 2166 2: 1096 days 4476
# see what happens in leap years
win[leap_year(start) & month(start) == 2 & day(start) %in% 28:29,
.(start, end, days = end - start + 1L)]
start end days 1: 1992-02-28 1995-02-27 1096 days 2: 1992-02-29 1995-02-27 1095 days 3: 1996-02-28 1999-02-27 1096 days 4: 1996-02-29 1999-02-27 1095 days 5: 1992-02-28 1995-02-27 1096 days 6: 1992-02-29 1995-02-27 1095 days 7: 1996-02-28 1999-02-27 1096 days 8: 1996-02-29 1999-02-27 1095 days
win[leap_year(end) & month(end) == 2 & day(end) %in% 28:29,
.(start, end, days = end - start + 1L)]
start end days 1: 1989-03-01 1992-02-29 1096 days 2: 1993-03-01 1996-02-29 1096 days 3: 1997-03-01 2000-02-29 1096 days 4: 1989-03-01 1992-02-29 1096 days 5: 1993-03-01 1996-02-29 1096 days 6: 1997-03-01 2000-02-29 1096 days
# aggregate in a non-equi-join
dataset[win, on = .(ID, time >= start, time <= end), by = .EACHI, .(avg = mean(x))]
ID time time avg 1: A 1988-03-15 1991-03-14 -0.01184078 2: A 1988-03-16 1991-03-15 -0.01317813 3: A 1988-03-17 1991-03-16 -0.01179571 4: A 1988-03-18 1991-03-17 -0.01006100 5: A 1988-03-19 1991-03-18 -0.01221798 --- 6638: B 1997-04-13 2000-04-12 -0.03412214 6639: B 1997-04-14 2000-04-13 -0.03604176 6640: B 1997-04-15 2000-04-14 -0.03556291 6641: B 1997-04-16 2000-04-15 -0.03392185 6642: B 1997-04-17 2000-04-16 -0.03393674
Upvotes: 4
Reputation: 47300
As @A.Suliman mentions in comments your example data has a fixed window width, but let's assume your real data hasn't as that's what your text says.
The parameter width
from rollapply
doesn't have to be constant, so we can compute the width first, make sure we align the window to the left, and run rollaply
.
library(zoo)
library(tidyverse)
dataset %>%
arrange(ID,time) %>%
group_by(ID) %>%
mutate(avg = rollapply(x, FUN = mean, align = "left",
width = map_dbl(time, ~which.max(time[time < .x + 3*365.25])) - row_number()+1))
#
# # A tibble: 8,832 x 4
# # Groups: ID [2]
# ID x time avg
# <fctr> <dbl> <date> <dbl>
# 1 A -0.0258 1988-03-15 0.0109
# 2 A -0.0258 1988-03-15 0.0109
# 3 A -0.1562 1988-03-16 0.0115
# 4 A -0.1562 1988-03-16 0.0115
# 5 A 0.8193 1988-03-17 0.0115
# 6 A 0.8193 1988-03-17 0.0112
# 7 A -1.1136 1988-03-18 0.0102
# 8 A -1.1136 1988-03-18 0.0107
# 9 A -0.9336 1988-03-19 0.0105
# 10 A -0.9336 1988-03-19 0.0109
Upvotes: 2