jmuhlenkamp
jmuhlenkamp

Reputation: 2150

Subset dates with a given weekday and select next date if weekday is missing

I'm able to find a lot of info on SO regarding handling subsetting dates to a certain weekday (e.g. Get Dates of a Certain Weekday from a Year in R). However, I am unable to find any that implement a fallback logic that I'd like. Specifically, if a given weekday does not exist in a given week, I'd like to grab the next available date, excluding Saturday and Sunday.

For example, from a vector of dates, I want to select all dates corresponding to Thursdays. However, in weeks where Thursdays are missing, I should instead pick the date of the next working day. In the example below, this is the following day, the Friday.

library(lubridate)

# Create some dates
dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)

# Remove Thursday, November 23
dates <- dates[dates != as.Date("2017-11-23")]

# Get all Thursdays in dates
dates[wday(dates) == 5]
# [1] "2017-11-16"

# Desired Output:
# Because Thursday 2017-11-23 is missing in a week,
# we roll over and select Friday 2017-11-24 instead  
# [1] "2017-11-16" "2017-11-24"

Note 1: For a given week where Thursday is missing and Friday is missing as well, I'd want to roll over to Monday. Essentially, for weeks where a Thursday is not found, grab the next date among the available dates.

Note 2: I'd like to accomplish this without any external dependencies other than common R packages such as lubridate, etc. (e.g. no dependency on a c++ library).

I'm confident I could write something to do what I want, but I am having trouble finding of creating something short and elegant.

Upvotes: 0

Views: 409

Answers (3)

Henrik
Henrik

Reputation: 67778

An alternative with findInterval.

Create a sequence of dates ('tmp'), from the focal weekday ('wd') in the week of min 'dates', to max 'dates'.

Select dates corresponding to the focal weekday ('wds').

Select working days from 'dates' ('dates_1_5').

Use findInterval to roll 'wds' to closest available working day in 'dates_1_5'.

f <- function(wd, dates){
  tmp <- seq(as.Date(paste(format(min(dates), "%Y-%W"), wd, sep = "-"),
                     format = "%Y-%W-%u"),
             max(dates), by = 1)

  wds <- tmp[as.integer(format(tmp, "%u")) == wd]

  dates_1_5 <- dates[as.integer(format(dates, "%u")) %in% 1:5]

  dates_1_5[findInterval(wds, dates_1_5, left.open = TRUE) + 1]
}

Some examples:

d <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)

dates <- d[d != as.Date("2017-11-23")]
f(wd = 4, dates)
# [1] "2017-11-16" "2017-11-24"

dates <- d[d != as.Date("2017-11-16")]
f(wd = 4, dates)
# [1] "2017-11-17" "2017-11-23"

dates <- d[!(d %in% as.Date(c("2017-11-16", "2017-11-17", "2017-11-21", "2017-11-23")))]
f(wd = 2, dates)
# [1] "2017-11-20" "2017-11-22"

Slightly more compact using a data.table rolling join:

library(data.table)

wd <- 2
# using 'dates' from above

d1 <- data.table(dates)
d2 <- data.table(dates = seq(as.Date(paste(format(min(dates), "%Y-%W"), wd, sep = "-"),
                                     format = "%Y-%W-%u"),
                             max(dates), by = 1))

d1[wday(dates) %in% 2:6][d2[wday(dates) == wd + 1],
                         on = "dates", .(x.dates), roll = -Inf]

...or a non-equi join:

d1[wday(dates) %in% 2:6][d2[wday(dates) == wd + 1],
                         on = .(dates >= dates), .(x.dates), mult = "first"]

If desired, just wrap in a function as above.

Upvotes: 1

TinglTanglBob
TinglTanglBob

Reputation: 647

might not be the most ellegant way, but i think it should work :)

library(lubridate)


dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-30"), by = 1) #your dates
dates <- dates[dates != as.Date("2017-11-23")] # thursday
dates <- dates[dates != as.Date("2017-11-24")] # friday
dates <- dates[dates != as.Date("2017-11-25")] # satureday
dates <- dates[dates != as.Date("2017-11-26")] # sunday
dates <- dates[dates != as.Date("2017-11-27")] # monday
dates <- dates[dates != as.Date("2017-11-28")] # tuesday
#dates <- dates[dates != as.Date("2017-11-29")] # wednesday

dates_shall_be <- seq.Date(min(dates)-wday(min(dates))+1, max(dates), by = 1) # create a shall-be list of days within your date-range
# min(dates)-wday(min(dates))+1 shiftback mindate to get missing thursdays in week one

thuesdays_shall = dates_shall_be[wday(dates_shall_be) == 5] # get all thuesdays that should be in there

for(i in 1:6) # run threw all possible followup days till wednesday next week 
{
  thuesdays_shall[!thuesdays_shall %in% dates] = thuesdays_shall[!thuesdays_shall %in% dates] + 1 # if date is not present in your data add another day to it
}

thuesdays_shall[!thuesdays_shall %in% dates] = NA # if date is still not present in the data after 6 shifts, this thursday + the whole followup days till next thursday are missing and NA is taken
thuesdays_shall

Upvotes: 0

Oliver Baumann
Oliver Baumann

Reputation: 2289

I'm breaking your condition of "no external dependencies", but as you already use lubridate (which is a dependency ;-) ), I'll provide you a solution that utilizes lead and lag from dplyr. You could write write those yourself though, looking at the source, if it really is a hard condition.

What I'm doing is figuring out where the "skips" are in the sequence by computing a kind of running diff of days. Once we know where the skip is, we just roll over to the next data in the sequence, whatever that is. Now, it might well be that this isn't a Friday, but a Saturday. In that case you're going to have to figure out if you still want the next Friday, even if there is a Thurday in between.

library(dplyr)

rollover_to_next <- function(dateseq, the_day = 5) {
  day_diffs <- lead(wday(dateseq) - lag(wday(dateseq))) %% 7
  skips <- which(day_diffs > 1) 

  sort(c(dateseq[wday(dateseq) == the_day], dateseq[skips + 1]))
}

dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)
dates <- dates[dates != as.Date("2017-11-23")]

rollover_to_next(dates)

Output:

[1] "2017-11-16" "2017-11-24"

You might have to account for the edge case where the idx + 1 element doesn't exist, but I'll leave that up to you to handle.

Upvotes: 0

Related Questions