Count number of occurences in date range in R

I have a dataframe with a number of accounts, their status and the start and endtime for that status. I would like to report on the number of accounts in each of these statuses over a date range. The data looks like the df below, with the resulting report. (Actual data contains more state values. N/A values are shown with a dummy date far in the future.)

df <- data.frame(account = c(1,1,2,3),
             state = c("Open","Closed","Open","Open"),
             startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"), 
             enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
             )

report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6), 
                 number.open = c(2,2,2,1,1,1)
                 )

I have looked at options involving rowwise() and mutate from dplyr and foverlaps from data.table, but haven't been able to code it up so it works. (See Checking if Date is Between two Dates in R)

Upvotes: 0

Views: 5460

Answers (1)

bouncyball
bouncyball

Reputation: 10761

We can use sapply to do this for us:

report$NumberOpen <- 
    sapply(report$date, function(x)
    sum(as.Date(df1$startdate) < as.Date(x) &
    as.Date(df1$enddate) > as.Date(x) & 
    df1$state == 'Open'))

#  report
#         date NumberOpen
# 1 2016-04-01          2
# 2 2016-04-02          2
# 3 2016-04-03          2
# 4 2016-04-04          1
# 5 2016-04-05          1
# 6 2016-04-06          1

data

df1 <- data.frame(account = c(1,1,2,3),
                 state = c("Open","Closed","Open","Open"),
                 startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"), 
                 enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)

report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6)
)

Upvotes: 2

Related Questions