Reputation: 2150
I'm able to find a lot of info on SO regarding handling subsetting dates to a certain weekday (e.g. Get Dates of a Certain Weekday from a Year in R). However, I am unable to find any that implement a fallback logic that I'd like. Specifically, if a given weekday does not exist in a given week, I'd like to grab the next available date, excluding Saturday and Sunday.
For example, from a vector of dates, I want to select all dates corresponding to Thursdays. However, in weeks where Thursdays are missing, I should instead pick the date of the next working day. In the example below, this is the following day, the Friday.
library(lubridate)
# Create some dates
dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)
# Remove Thursday, November 23
dates <- dates[dates != as.Date("2017-11-23")]
# Get all Thursdays in dates
dates[wday(dates) == 5]
# [1] "2017-11-16"
# Desired Output:
# Because Thursday 2017-11-23 is missing in a week,
# we roll over and select Friday 2017-11-24 instead
# [1] "2017-11-16" "2017-11-24"
Note 1: For a given week where Thursday is missing and Friday is missing as well, I'd want to roll over to Monday. Essentially, for weeks where a Thursday is not found, grab the next date among the available dates.
Note 2: I'd like to accomplish this without any external dependencies other than common R packages such as lubridate, etc. (e.g. no dependency on a c++ library).
I'm confident I could write something to do what I want, but I am having trouble finding of creating something short and elegant.
Upvotes: 0
Views: 409
Reputation: 67778
An alternative with findInterval
.
Create a sequence of dates ('tmp'), from the focal weekday ('wd') in the week of min
'dates', to max
'dates'.
Select dates corresponding to the focal weekday ('wds').
Select working days from 'dates' ('dates_1_5').
Use findInterval
to roll 'wds' to closest available working day in 'dates_1_5'.
f <- function(wd, dates){
tmp <- seq(as.Date(paste(format(min(dates), "%Y-%W"), wd, sep = "-"),
format = "%Y-%W-%u"),
max(dates), by = 1)
wds <- tmp[as.integer(format(tmp, "%u")) == wd]
dates_1_5 <- dates[as.integer(format(dates, "%u")) %in% 1:5]
dates_1_5[findInterval(wds, dates_1_5, left.open = TRUE) + 1]
}
Some examples:
d <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)
dates <- d[d != as.Date("2017-11-23")]
f(wd = 4, dates)
# [1] "2017-11-16" "2017-11-24"
dates <- d[d != as.Date("2017-11-16")]
f(wd = 4, dates)
# [1] "2017-11-17" "2017-11-23"
dates <- d[!(d %in% as.Date(c("2017-11-16", "2017-11-17", "2017-11-21", "2017-11-23")))]
f(wd = 2, dates)
# [1] "2017-11-20" "2017-11-22"
Slightly more compact using a data.table
rolling join:
library(data.table)
wd <- 2
# using 'dates' from above
d1 <- data.table(dates)
d2 <- data.table(dates = seq(as.Date(paste(format(min(dates), "%Y-%W"), wd, sep = "-"),
format = "%Y-%W-%u"),
max(dates), by = 1))
d1[wday(dates) %in% 2:6][d2[wday(dates) == wd + 1],
on = "dates", .(x.dates), roll = -Inf]
...or a non-equi join:
d1[wday(dates) %in% 2:6][d2[wday(dates) == wd + 1],
on = .(dates >= dates), .(x.dates), mult = "first"]
If desired, just wrap in a function as above.
Upvotes: 1
Reputation: 647
might not be the most ellegant way, but i think it should work :)
library(lubridate)
dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-30"), by = 1) #your dates
dates <- dates[dates != as.Date("2017-11-23")] # thursday
dates <- dates[dates != as.Date("2017-11-24")] # friday
dates <- dates[dates != as.Date("2017-11-25")] # satureday
dates <- dates[dates != as.Date("2017-11-26")] # sunday
dates <- dates[dates != as.Date("2017-11-27")] # monday
dates <- dates[dates != as.Date("2017-11-28")] # tuesday
#dates <- dates[dates != as.Date("2017-11-29")] # wednesday
dates_shall_be <- seq.Date(min(dates)-wday(min(dates))+1, max(dates), by = 1) # create a shall-be list of days within your date-range
# min(dates)-wday(min(dates))+1 shiftback mindate to get missing thursdays in week one
thuesdays_shall = dates_shall_be[wday(dates_shall_be) == 5] # get all thuesdays that should be in there
for(i in 1:6) # run threw all possible followup days till wednesday next week
{
thuesdays_shall[!thuesdays_shall %in% dates] = thuesdays_shall[!thuesdays_shall %in% dates] + 1 # if date is not present in your data add another day to it
}
thuesdays_shall[!thuesdays_shall %in% dates] = NA # if date is still not present in the data after 6 shifts, this thursday + the whole followup days till next thursday are missing and NA is taken
thuesdays_shall
Upvotes: 0
Reputation: 2289
I'm breaking your condition of "no external dependencies", but as you already use lubridate
(which is a dependency ;-) ), I'll provide you a solution that utilizes lead
and lag
from dplyr
. You could write write those yourself though, looking at the source, if it really is a hard condition.
What I'm doing is figuring out where the "skips" are in the sequence by computing a kind of running diff of days. Once we know where the skip is, we just roll over to the next data in the sequence, whatever that is. Now, it might well be that this isn't a Friday, but a Saturday. In that case you're going to have to figure out if you still want the next Friday, even if there is a Thurday in between.
library(dplyr)
rollover_to_next <- function(dateseq, the_day = 5) {
day_diffs <- lead(wday(dateseq) - lag(wday(dateseq))) %% 7
skips <- which(day_diffs > 1)
sort(c(dateseq[wday(dateseq) == the_day], dateseq[skips + 1]))
}
dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)
dates <- dates[dates != as.Date("2017-11-23")]
rollover_to_next(dates)
Output:
[1] "2017-11-16" "2017-11-24"
You might have to account for the edge case where the idx + 1
element doesn't exist, but I'll leave that up to you to handle.
Upvotes: 0