Kristen Cyr
Kristen Cyr

Reputation: 726

How to calculate Moving Average with time series data at 24, 48, and 72 hours?

I am trying to calculate the moving average for each name in the dataset below at 1, 2 , and 3 days. Here is the example of what my dataset looks like (the dput example was a little long for this question):

 Date_Time_GMT_3         name                      value
  <dttm>                 <chr>                     <dbl>
1 2021-07-08 07:15:00  20817729_2PT_Stationary    21.5
2 2021-07-08 07:15:00  20822229_4PT_Stationary    17.9
3 2021-07-08 07:15:00  20819744_6PT_AIR           17.9
4 2021-07-08 08:30:00  20817729_2PT_Stationary    18.6
5 2021-07-08 08:30:00  20822229_4PT_Stationary    16.5
6 2021-07-08 08:30:00  20819744_6PT_AIR           19.0
7 2021-07-09 10:45:00  20817729_2PT_Stationary    21.5
8 2021-07-09 10:45:00  20822229_4PT_Stationary    17.9
9 2021-07-09 10:45:00  20819744_6PT_AIR           17.9
10 2021-07-11 09:15:00  20817729_2PT_Stationary    18.6
11 2021-07-11 09:15:00  20822229_4PT_Stationary    16.5
12 2021-07-11 09:15:00  20819744_6PT_AIR           19.0
13 2021-07-12 15:45:00  20817729_2PT_Stationary    21.5
14 2021-07-12 15:45:00  20822229_4PT_Stationary    17.9
15 2021-07-12 15:45:00  20819744_6PT_AIR           17.9
16 2021-07-13 06:15:00  20817729_2PT_Stationary    18.6
17 2021-07-13 06:15:00  20822229_4PT_Stationary    16.5
18 2021-07-13 06:15:00  20819744_6PT_AIR           19.0

I have tried using this code

some_data = df %>%
  dplyr::group_by(name) %>% 
  dplyr::mutate(mean_01da = zoo::rollmean(value, k = 1, fill = NA),
                mean_02da = zoo::rollmean(value, k = 2, fill = NA),
                mean_03da = zoo::rollmean(value, k = 3, fill = NA)) %>% 
  dplyr::ungroup()

but the moving average isn't over the time specified - I'm thinking because k should be the number of rows I want to use for values. Is there a way to calculate the moving average by specifying the number of days to calculate over (i.e. 1, 2, and 3 days) using the date/time column?

Upvotes: 0

Views: 639

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 270055

Use an expression involving findInterval to find how many rows back to use and note that in rollapplyr we can specify a vector for the number of rows to use. POSIXct objects use seconds internally so 24 * 3600 is the number of seconds in 24 hours, etc.

library(dplyr, exclude = c("lag", "filter"))
library(zoo)

DF %>%
  group_by(name) %>%
  mutate(tt = Date_Time_GMT_3,
    day1 = rollapplyr(value, 1:n() - findInterval(tt - 24 * 3600, tt), mean),
    day2 = rollapplyr(value, 1:n() - findInterval(tt - 48 * 3600, tt), mean),
    day3 = rollapplyr(value, 1:n() - findInterval(tt - 72 * 3600, tt), mean)) %>%
  ungroup %>%
  select(-tt)

Note

The input in reproducible form:

DF <-
structure(list(Date_Time_GMT_3 = structure(c(1625742900, 1625742900, 
1625742900, 1625747400, 1625747400, 1625747400, 1625841900, 1625841900, 
1625841900, 1626009300, 1626009300, 1626009300, 1626119100, 1626119100, 
1626119100, 1626171300, 1626171300, 1626171300), class = c("POSIXct", 
"POSIXt"), tzone = ""), name = c("20817729_2PT_Stationary", "20822229_4PT_Stationary", 
"20819744_6PT_AIR", "20817729_2PT_Stationary", "20822229_4PT_Stationary", 
"20819744_6PT_AIR", "20817729_2PT_Stationary", "20822229_4PT_Stationary", 
"20819744_6PT_AIR", "20817729_2PT_Stationary", "20822229_4PT_Stationary", 
"20819744_6PT_AIR", "20817729_2PT_Stationary", "20822229_4PT_Stationary", 
"20819744_6PT_AIR", "20817729_2PT_Stationary", "20822229_4PT_Stationary", 
"20819744_6PT_AIR"), value = c(21.5, 17.9, 17.9, 18.6, 16.5, 
19, 21.5, 17.9, 17.9, 18.6, 16.5, 19, 21.5, 17.9, 17.9, 18.6, 
16.5, 19)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18"))

Upvotes: 1

Related Questions