Reputation: 295
I want to sum all values for a variable for the past 3 months according to exact day for each ID. There might be 2 obs for some ID, 70 obs for another ID etc. For example, on April 15 I want to look all the way back to the January 16 and sum all values on this interval including January 16 and April 15. There could by 2 observations on this interval or even 70.
The window size is therefore never the same which means I can't use rollapply from the zoo package.
Here is a small example data with expected output.
df <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 3, 3, 3), date = c("01/01/2015", "01/02/2015",
"15/02/2015", "01/03/2015", "01/04/2015", "01/01/2015", "01/02/2015",
"15/02/2015", "20/02/2015", "01/03/2015", "14/03/2015", "01/04/2015",
"01/01/2015", "10/02/2015", "15/02/2015", "20/02/2015", "25/02/2015",
"01/03/2015", "10/03/2015", "20/03/2015", "01/04/2015"), value = c(106,
57, 43, 105, 180, 109, 36, 102, 97, 97, 113, 83, 178, 19, 80,
167, 40, 5, 30, 124, 8), expected = c(106, 163, 206, 311, 385,
109, 145, 247, 344, 441, 554, 528, 178, 197, 277, 444, 484, 489,
519, 643, 473)), row.names = c(NA, -21L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1
Views: 264
Reputation: 269346
The width
in rollapply
can be a vector of widths and we can compute those using findInterval
:
library(dplyr)
library(lubridate)
library(zoo)
df %>%
group_by(id) %>%
mutate(date = dmy(date),
expected2 = rollapplyr(value,
width = 1:n() - findInterval(date - months(3), date),
FUN = sum, na.rm = TRUE, fill = NA)) %>%
ungroup
giving:
# A tibble: 21 x 5
id date value expected expected2
<dbl> <date> <dbl> <dbl> <dbl>
1 1 2015-01-01 106 106 106
2 1 2015-02-01 57 163 163
3 1 2015-02-15 43 206 206
4 1 2015-03-01 105 311 311
5 1 2015-04-01 180 385 385
6 2 2015-01-01 109 109 109
7 2 2015-02-01 36 145 145
8 2 2015-02-15 102 247 247
9 2 2015-02-20 97 344 344
10 2 2015-03-01 97 441 441
# ... with 11 more rows
Upvotes: 2
Reputation: 388797
We convert the date
to actual date
object, group_by
id
and for each date
find out the dates in the group which lie in last 3 months and take sum
of their respective value
.
library(dplyr)
df %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
group_by(id) %>%
mutate(ans = purrr::map_dbl(date, ~ sum(value[
date > (. - months(3)) & date <= .])))
# id date value expected ans
# <dbl> <date> <dbl> <dbl> <dbl>
# 1 1 2015-01-01 106 106 106
# 2 1 2015-02-01 57 163 163
# 3 1 2015-02-15 43 206 206
# 4 1 2015-03-01 105 311 311
# 5 1 2015-04-01 180 385 385
# 6 2 2015-01-01 109 109 109
# 7 2 2015-02-01 36 145 145
# 8 2 2015-02-15 102 247 247
# 9 2 2015-02-20 97 344 344
#10 2 2015-03-01 97 441 441
# … with 11 more rows
Upvotes: 0