R: Counting dates within time intervals

Question

Assume we have data input:

df.in <- data.frame(event = c(1,2,3,4,5), 
                    start = c("2015-01-01", "2015-01-01", "2015-01-02",
                              "2015-01-02", "2015-01-03"),
                    end = c("2015-01-03", "2015-01-04", "2015-01-03",
                            "2015-01-05", "2015-01-05"))
df.in$start <- as.Date(df.in$start, "%Y-%m-%d")
df.in$end <- as.Date(df.in$end, "%Y-%m-%d")

> df.in
  event      start        end
1     1 2015-01-01 2015-01-03
2     2 2015-01-01 2015-01-04
3     3 2015-01-02 2015-01-03
4     4 2015-01-02 2015-01-05
5     5 2015-01-03 2015-01-05

Goal is to count date occurrences for all events (including start, excluding end). To fill out this data frame:

df.out <- data.frame(date = c("2015-01-01", "2015-01-02", "2015-01-03", 
                              "2015-01-04", "2015-01-05"),
                     count = 0)
df.out$date <- as.Date(df.out$date, "%Y-%m-%d")
> df.out
        date count
1 2015-01-01     0
2 2015-01-02     0
3 2015-01-03     0
4 2015-01-04     0
5 2015-01-05     0

Conceptually it would look something like this:

#1 **
#2 ****
#3 ***
#4 **
#5

So, my current idea is a loop:

for(i in seq_along(df.out$date)){
  temp.df <- df.in[df.in$start <= df.out$date[i],]
  df.out$count[i] <- nrow(temp.df) - nrow(temp.df[temp.df$end <= df.out$date[i],])
}
> df.out
        date count
1 2015-01-01     2
2 2015-01-02     4
3 2015-01-03     3
4 2015-01-04     2
5 2015-01-05     0

It works, but I am sort of afraid that this temp.df that I am invoking can potentially snowball into something very large. Given that count of events can easily go into tens or even hundreds of thousands.

So my question is - can there be a more efficient way? Perhaps by using some date packages such as lubridate where I can somehow vectorize the whole thing?

R: Counting dates within time intervals

Answers (1)

Related Questions