Reputation: 5424
I have data on spells with start and end dates, by a person identifier.
temp <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), spell = c(1L, 2L, 3L, 1L, 2L, 3L), date1 = structure(c(14611,
14654, 15141, 14853, 14867, 14975), class = "Date"), date2 = structure(c(14612,
14656, 15142, 14862, 14872, 14976), class = "Date")), class = "data.frame", .Names = c("id",
"spell", "date1", "date2"), row.names = c(NA, -6L))
I would like to calculate a rolling total of the days in spells within the last year (defined by a 365 day window), so that each row gets a total number of days that are included in the spells, prior to the current, within one year.
So in my example, person 1 has 3 spells, two of which occurred within a one year window (rows 1 and 2). The first spell has no prior and therefore the days.observed.in.past.yr is 0. The second has a prior spell (row 1), of 1 days length, so days.observed.in.past.yr is 0.
Person 2 has three spells, all occurring within one year. For spell 2, person 2, the prior spell was nine days long and so days.observed.in.past.yr is 9. Then for spell three, there are two prior spells and days.observed.in.past.yr is 14 (9+5). This would continue adding for as many spells there are within a 365 day window for the current row.
id spell date1 date2 days.observed.in.past.yr
1 1 2010-01-02 2010-01-03 0
1 2 2010-02-14 2010-02-16 1
1 3 2011-06-16 2011-06-17 0
2 1 2010-09-01 2010-09-10 0
2 2 2010-09-15 2010-09-20 9
2 3 2011-01-01 2011-01-02 14
However, beyond calculating the trivial things like spell length and the date one year earlier I've got no idea how to do this. The nearest similar problem that I've found is rgolf:rolling window but I don't know how to apply this to my problem.
Can anyone help?
Upvotes: 1
Views: 3144
Reputation: 2877
This might be possible to achieve with runner package. OP required rolling sum over last 365 days excluding current observation. sum_run calculate sum of date2 - date1
in 365 days period. To exclude current row one can substract - spell_days
or use a lag = 1
(exclude current day).
library(dplyr)
library(runner)
temp %>%
group_by(id) %>%
mutate(
spell_days = date2 - date1,
days.observed.in.past.yr = sum_run(
x = spell_days,
k = 365,
idx = date1
) - spell_days
)
# id spell date1 date2 spell_days days.observed.in.past.yr
# <fct> <int> <date> <date> <drtn> <drtn>
# 1 1 1 2010-01-02 2010-01-03 1 days 0 days
# 2 1 2 2010-02-14 2010-02-16 2 days 1 days
# 3 1 3 2011-06-16 2011-06-17 1 days 0 days
# 4 2 1 2010-09-01 2010-09-10 9 days 0 days
# 5 2 2 2010-09-15 2010-09-20 5 days 9 days
# 6 2 3 2011-01-01 2011-01-02 1 days 14 days
Upvotes: 0
Reputation: 70336
Here's one way to do that using dplyr
:
require(dplyr)
temp %>%
mutate(year1 = format(date1, "%Y"),
year2 = format(date2, "%Y")) %>%
group_by(id) %>%
mutate(count = ifelse(lag(year1, 1, default = 0) == year1, lag(date2, 1) - lag(date1,1), 0 )) %>%
select(-c(year1, year2))
#Source: local data frame [5 x 4]
#Groups: id
#
# id date1 date2 count
#1 1 2010-01-02 2010-01-03 0
#2 1 2010-02-14 2010-02-16 1
#3 1 2011-06-16 2011-06-17 0
#4 2 2010-09-01 2010-09-10 0
#5 2 2010-09-15 2010-09-20 9
To do this using a 365 day "rolling" window, you can use the following:
temp %>%
group_by(id) %>%
mutate(count = ifelse(date1 - lag(date1, 1, default = 0) <= 365, lag(date2, 1) - lag(date1,1), 0))
The result is the same as above for the sample data.
I thought about this again and had some doubts that the first edit actually functions as expected since it only looks into the previous row to check if that row is less than 365 days before the current date1
. So I came up with another version which looks into all rows per id, creates groups of data that is within a 365 day window and then sums up the date differences - maybe this is what you want.
df %>%
group_by(id) %>%
arrange(id, date1) %>%
mutate(delta = floor(c(0, diff(date1)) / 365),
delta = cumsum(delta)) %>%
group_by(delta, add = TRUE) %>%
mutate(count = cumsum(as.numeric(date2-date1)) - (date2 - date1)) %>%
ungroup() %>%
select(-delta)
The question that is not really clear to me is whether you only want to sum up date differences in the 1 previous row (if it exists and it is less than 365 days earlier, in that case edit 1 should work) or do you want to sum up all previous differences that are less than 365 days earlier (in that case edit 2 should work).
Upvotes: 2