dplyr or data.table to calculate time series aggregations in R

Question

I'm trying to summarize a data.frame which contains date (or time) information.

Let's suppose this one containing hospitalization records by patient:

df <- data.frame(c(1, 2, 1, 1, 2, 2),
             c(as.Date("2013/10/15"), as.Date("2014/10/15"), as.Date("2015/7/16"), as.Date("2016/1/7"), as.Date("2015/12/20"), as.Date("2015/12/25")))
names(df) <- c("patient.id", "hospitalization.date")

df looks like this:

> df
      patient.id hospitalization.date
    1          1           2013-10-15
    2          2           2014-10-15
    3          1           2015-07-16
    4          1           2016-01-07
    5          2           2015-12-20
    6          2           2015-12-25

For each observation, I need to count the number of hospitalizations occuring in the 365 days before that hospitalization.

In my example it would be the new df$hospitalizations.last.year column.

> df
      patient.id hospitalization.date hospitalizations.last.year
    1          1           2013-10-15                          1
    2          2           2014-10-15                          1
    3          1           2015-07-16                          1
    4          2           2015-12-20                          1
    5          2           2015-12-25                          2
    6          1           2016-01-07                          2
    7          2           2016-02-10                          3

Note that the counter is including the number of previous records in the last 365 days, not only in the current year.

I'm trying to do that using dplyr or data.table because my dataset is huge and performance matters. ¿Is it possible?

dplyr or data.table to calculate time series aggregations in R

Answers (1)

Data

Related Questions