Reputation: 89
I searched for a while but seems most of the days counting methods are for two dates only(i.e., single date range). I don't know if there's a way faster than using a for-loop to calculate days between multiple date ranges.
Here's my data DF:
ItemName StartDate EndDate
Apple 2014-01-02 2018-01-01
Orange 2014-08-06 2017-02-03
Banana 2015-03-01 2019-12-31
...
Now I have a reporting date range: 2015-01-01 to 2019-12-31. I want to count days of those items covered by the reporting dates by quarter.
For example, the first quarter of the reporting dates are between 2015-01-01 & 2015-03-31 (including both end)
ItemName StartDate EndDate CountDays SampleQuarter
Apple 2014-01-02 2018-01-02 90 2015-Q1
Orange 2014-08-06 2017-02-03 90 2015-Q1
Banana 2015-03-01 2019-12-31 31 2015-Q1
...
Apple 2014-01-02 2018-01-02 2 2018-Q1
Orange 2014-08-06 2017-02-03 0 2018-Q1
Banana 2015-03-01 2019-12-31 31 2018-Q1
....
My goal is to use the above result to get this table:
SampleQuarter TotalDays
2015-Q1 211
2015-Q2 273
...
2018-Q1 33
2018-Q2 91
...
I use a for loop to handle it. First I created a reference date list:
Date = c("2015-01-01","2015-04-01","2015-06-01",....)
TotalCount = NULL
for(i in 1:length(Date)){
START = as.Date(Date[i])
END = START %m+% months(3) -1 ## the end of each quarter
## find items that start and end date range is covered by the selected quarter
## than calculated the days covered by the selected quarter only
df = DF %>% filter( StartDate <=END & EndDate >=END ) %>%
mutate(StartDateNEW = ifelse(StartDate <= START,START, StartDate ))%>%
mutate(EndDateNEW = ifelse(EndDate <= END,EndDate , END)) %>%
mutate(CountDays= EndDateNEW -StartDateNEW +1) %>%
select(-StartDateNEW, EndDateNEW ) %>%
mutate(SampleQuarter =paste0( format(START , "%Y") ,"-Q",quarter(START) ) %>%
group_by(SampleQuarter) %>% summarise(TotalDays = sum(CountDays))
TotalCount = rbind(TotalCount ,df)
START = START %m+% months(3) ## the beginning of next quarter
}
For a small sample of data, the for-loop works well. When the sample size is huge, the code will take a while. I wonder if there's a method that I can bypass the for-loop and make the whole process faster?
Thanks in advance.
Upvotes: 1
Views: 144
Reputation: 887501
Perhaps we can use
library(dplyr)
library(purrr)
library(tidyr)
library(zoo)
DF %>%
mutate(across(ends_with("Date"), as.Date),
new = map2(StartDate, EndDate,
~ tibble(days = seq(.x, .y, by = 'day'),
Quarter = cut(days, breaks = seq(.x, .y, by = 'quarter'))))) %>%
unnest(c(new)) %>%
group_by(ItemName, StartDate, EndDate, Quarter = as.yearqtr(Quarter)) %>%
summarise(n = n(), .groups = 'drop')
Upvotes: 2