James Crumpler
James Crumpler

Reputation: 204

Dates %within% Intervals

Running into a real head-scratcher and not sure of how to resolve. Really hoping some of you may be able to help. Also, first time I've ever contributed to StackOverflow....yay!

library(tidyverse)
library(lubridate)

start_date <- ymd("2014-06-28")
end_date <- ymd("2019-06-30")
PayPeriod_EndDate <- seq(start_date, end_date, by = '2 week')
PayPeriod_Interval <- int_diff(PayPeriod_EndDate)

This creates a vector of intervals, with each interval representing a pay period of two weeks in length. This is part one, and part one is relatively easy (though still took awhile to figure out, ha).

Part two contains a vector of dates.

Dates <- c("2014-07-08", "2018-10-20", "2018-12-13", "2018-12-13", "2018-12-06", "2018-11-30", "2019-01-16", "2019-01-23", "2019-03-15", "2018-10-02")

I want to identify Dates %within% Intervals, with the output being the interval that each date is within. So Date "2014-07-08" will be assigned 2014-06-28 UTC--2014-07-12 UTC, since this dates is within this interval.

A very similar problem seems to have been explored here...https://github.com/tidyverse/lubridate/issues/658

I have attempted the following

ymd(Dates) %within% PayPeriod_Interval

However, the result only calculates for the first element in the Dates vector. I have since tried various combinations of for loops, mutating into factors, etc... with little progress. This is work related so am really on a time-deficit and will be monitoring this post throughout the day and into the weekend.

Best and thank you! James

Upvotes: 1

Views: 1259

Answers (2)

Dave2e
Dave2e

Reputation: 24069

The tidyverse is very useful but sometimes, base R is all you need. In this case the cut function is all you need.

library(lubridate)

start_date <- ymd("2014-06-28")
end_date <- ymd("2019-06-30")
PayPeriod_EndDate <- seq(start_date, end_date, by = '2 week')

Dates <- c("2014-07-08", "2018-10-20", "2018-12-13", "2018-12-13", "2018-12-06", "2018-11-30", "2019-01-16", "2019-01-23", "2019-03-15", "2018-10-02")


startperiod<-cut(as.Date(Dates), breaks=PayPeriod_EndDate)
endperiod<-as.Date(startperiod)+13

The output from the cut function is the start date of each pay period which the "Dates" variable is located.

Upvotes: 4

kath
kath

Reputation: 7724

This is how a map - solution could look like:

map(ymd(Dates), ~ PayPeriod_Interval[.x %within% PayPeriod_Interval])
# [[1]]
# [1] 2014-06-28 UTC--2014-07-12 UTC
# 
# [[2]]
# [1] 2018-10-13 UTC--2018-10-27 UTC
# 
# ...

To have the result as a interval vector (and not list) you can use:

PayPeriod_Interval[map_int(ymd(Dates), ~ which(.x %within% PayPeriod_Interval))]

# [1] 2014-06-28 UTC--2014-07-12 UTC 2018-10-13 UTC--2018-10-27 UTC 2018-12-08 UTC--2018-12-22 UTC 2018-12-08 UTC--2018-12-22 UTC 2018-11-24 UTC--2018-12-08 UTC
# [6] 2018-11-24 UTC--2018-12-08 UTC 2019-01-05 UTC--2019-01-19 UTC 2019-01-19 UTC--2019-02-02 UTC 2019-03-02 UTC--2019-03-16 UTC 2018-09-29 UTC--2018-10-13 UTC

If you are just interested in the end date of the interval an option is

PayPeriod_EndDate[map_int(ymd(Dates), ~ which.min(.x > PayPeriod_EndDate))]
# [1] "2014-07-12" "2018-10-27" "2018-12-22" "2018-12-22" "2018-12-08" "2018-12-08" "2019-01-19" "2019-02-02" "2019-03-16" "2018-10-13"

which.min returns number of the entry of the first Date of PayPeriod_EndDate that is not smaller than the specific date in the Dates-vector, thus the Date which is at the end of the specific payment period.

Upvotes: 1

Related Questions