Viðar Ingason
Viðar Ingason

Reputation: 295

Rolling sum based on date variable

I want to sum all values for a variable for the past 3 months according to exact day for each ID. There might be 2 obs for some ID, 70 obs for another ID etc. For example, on April 15 I want to look all the way back to the January 16 and sum all values on this interval including January 16 and April 15. There could by 2 observations on this interval or even 70.

The window size is therefore never the same which means I can't use rollapply from the zoo package.

Here is a small example data with expected output.

df <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 
3, 3, 3, 3, 3, 3, 3, 3), date = c("01/01/2015", "01/02/2015", 
"15/02/2015", "01/03/2015", "01/04/2015", "01/01/2015", "01/02/2015", 
"15/02/2015", "20/02/2015", "01/03/2015", "14/03/2015", "01/04/2015", 
"01/01/2015", "10/02/2015", "15/02/2015", "20/02/2015", "25/02/2015", 
"01/03/2015", "10/03/2015", "20/03/2015", "01/04/2015"), value = c(106, 
57, 43, 105, 180, 109, 36, 102, 97, 97, 113, 83, 178, 19, 80, 
167, 40, 5, 30, 124, 8), expected = c(106, 163, 206, 311, 385, 
109, 145, 247, 344, 441, 554, 528, 178, 197, 277, 444, 484, 489, 
519, 643, 473)), row.names = c(NA, -21L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 1

Views: 264

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269346

The width in rollapply can be a vector of widths and we can compute those using findInterval:

library(dplyr)
library(lubridate)
library(zoo)

df %>%
  group_by(id) %>%
  mutate(date = dmy(date), 
         expected2 = rollapplyr(value, 
          width = 1:n() - findInterval(date - months(3), date), 
          FUN = sum, na.rm = TRUE, fill = NA)) %>%
  ungroup

giving:

# A tibble: 21 x 5
      id date       value expected expected2
   <dbl> <date>     <dbl>    <dbl>     <dbl>
 1     1 2015-01-01   106      106       106
 2     1 2015-02-01    57      163       163
 3     1 2015-02-15    43      206       206
 4     1 2015-03-01   105      311       311
 5     1 2015-04-01   180      385       385
 6     2 2015-01-01   109      109       109
 7     2 2015-02-01    36      145       145
 8     2 2015-02-15   102      247       247
 9     2 2015-02-20    97      344       344
10     2 2015-03-01    97      441       441
# ... with 11 more rows

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388797

We convert the date to actual date object, group_by id and for each date find out the dates in the group which lie in last 3 months and take sum of their respective value.

library(dplyr)

df %>%
  mutate(date = as.Date(date, "%d/%m/%Y")) %>%
  group_by(id) %>%
  mutate(ans = purrr::map_dbl(date, ~ sum(value[
             date > (. -  months(3)) & date <= .])))


#      id date       value expected   ans
#    <dbl> <date>     <dbl>    <dbl> <dbl>
# 1     1 2015-01-01   106      106   106
# 2     1 2015-02-01    57      163   163
# 3     1 2015-02-15    43      206   206
# 4     1 2015-03-01   105      311   311
# 5     1 2015-04-01   180      385   385
# 6     2 2015-01-01   109      109   109
# 7     2 2015-02-01    36      145   145
# 8     2 2015-02-15   102      247   247
# 9     2 2015-02-20    97      344   344
#10     2 2015-03-01    97      441   441
# … with 11 more rows

Upvotes: 0

Related Questions