Reputation: 1177

How can I compute a variable lag value to create a moving average over different time spans?

My problem is quite simple but I still cannot find an easy solution that doesn't require me to create a lot of unnecessary filler rows.

Given this dataset:

df <- structure(
  list(
    date = c(
      2015.16666666667,
      2015.33333333333,
      2015.83333333333,
      2016,
      2016.08333333333,
      2016.25,
      2016.33333333333,
      2016.41666666667,
      2016.5,
      2016.66666666667
    ),
    Age = c(
      1,
      2.99999999999818,
      8.99999999999818,
      10.9999999999991,
      11.9999999999982,
      13.9999999999991,
      14.9999999999982,
      16,
      16.9999999999991,
      19
    ),
    year = c(
      2015L,
      2015L,
      2015L,
      2015L,
      2016L,
      2016L,
      2016L,
      2016L,
      2016L,
      2016L
    ),
    month = c(2L, 4L,
              10L, 12L, 1L, 3L, 4L, 5L, 6L, 8L),
    r_Total = c(
      481.02,
      666.36,
      851.7,
      1633.74,
      2155.1,
      2613.74,
      3105.44,
      4429.52,
      5170.88,
      5170.88
    )
  ),
  row.names = c(NA,-10L),
  class = c("tbl_df", "tbl", "data.frame")
)

I want to compute a moving average of r_total for the last 12 months. However the data has no rows for months where r_total was 0 so my usual solution does not work:

library(dplyr)

df %>%
mutate(cummulative_sum = cumsum(r_total), moving_average = (cummulative_sum-lag(cummulative_sum,12)  )/12)

This computes a moving average for the last 12 values but crucially not the last 12 months!

lag() which I use here just looks in the ordered vector and identifies the value from the -12th position. However I need a function that gives me the value from when Age == Age-12 (Age being months since inception of the value).

So what can I do?

Upvotes: 0

Answers (2)

MrGumble

Reputation: 5766

I too have been struggling with some of these moving-windows issues. And in cases where the step size or window size is not a constant number of rows, tidyverse-approaches can become cumbersome.

In these cases, although I sometimes try to hard to fit it into a piped fashion, ordinary loops can be easier to work with.

First attempt with a loop. Instead of thinking the main input is the data.frame, our main input is actually the dates at which we want to look back the 12 months (which so happens to be derived from df, but you could instead choose to go by a calendar year or whatnot).

Remember, when using loops to build a result-set, use lists or pre-allocated result-vectors instead of growing a data.frame or vector by repeatedly appending.

df$yearmonth <- with(df, year + (month-1)/12)
df$cummulative <- NA_real_


for (i in seq_along(df$Age)) {
  df$cummulative[i] <- df %>% 
    filter(between(Age, df$Age[i]-11, df$Age[i])) %>% ## any rows within the past 12 months
    pull(r_Total) %>% sum
}

> df
# A tibble: 10 x 6
    date   Age  year month r_Total cummulative
   <dbl> <dbl> <int> <int>   <dbl>       <dbl>
 1 2015.  1     2015     2    481.        481.
 2 2015.  3.00  2015     4    666.       1147.
 3 2016.  9.00  2015    10    852.       1999.
 4 2016  11.0   2015    12   1634.       3633.
 5 2016. 12.0   2016     1   2155.       5788.
 6 2016. 14.0   2016     3   2614.       7254.
 7 2016. 15.0   2016     4   3105.      10360.
 8 2016. 16     2016     5   4430.      14789.
 9 2016. 17.0   2016     6   5171.      19960.
10 2017. 19     2016     8   5171.      25131

It is however unclear what you are averaging over; is the cummulative r_total always divided by 12 months, even when it's sum is only 2 rows?

Upvotes: 0

Jon Spring

Reputation: 66510

The slider package is great for when you need to use another column to define the time window.

library(slider)
df %>% 
  mutate(avg_12mo = slide_index_dbl(r_Total, Age, mean, .before = 11),
         sum_12mo = slide_index_dbl(r_Total, Age, sum, .before = 11))


# A tibble: 10 x 7
    date   Age  year month r_Total avg_12mo sum_12mo
   <dbl> <dbl> <int> <int>   <dbl>    <dbl>    <dbl>
 1 2015.  1     2015     2    481.     481.     481.
 2 2015.  3.00  2015     4    666.     574.    1147.
 3 2016.  9.00  2015    10    852.     666.    1999.
 4 2016  11.0   2015    12   1634.     908.    3633.
 5 2016. 12.0   2016     1   2155.    1158.    5788.
 6 2016. 14.0   2016     3   2614.    1814.    7254.
 7 2016. 15.0   2016     4   3105.    2072.   10360.
 8 2016. 16     2016     5   4430.    2465.   14789.
 9 2016. 17.0   2016     6   5171.    2851.   19960.
10 2017. 19     2016     8   5171.    3141.   25131

Upvotes: 1

How can I compute a variable lag value to create a moving average over different time spans?

Answers (2)

Related Questions