Reputation: 5819
I understand that I can cut dates in the following manner:
library(tidyverse)
dates <- parse_date(c("2018-02-01", "2018-02-15", "2018-02-20", "2018-03-20"))
cut.dates <- cut(dates, breaks = parse_date(c("2018-01-01", "2018-02-10", "2018-12-31")))
table(cut.dates)
But how do I cut dates based on each respective dates' position in the list, not the actual date? I want to replace my third line shown above with something like:
cut.dates <- cut(dates, c(0, 2, nrow(dates))
0
would be the initial position to start the cut
2
would be a cut between the 1st and 2nd entry in the list
nrow(dates)
would be the final cut - the last position in my list
Upvotes: 0
Views: 43
Reputation: 160407
I think what you want is to have the dates used for cutting dynamically determined from the data instead of manually specifying them.
I'll generate some more dates, since it's hard to test four dates on a quarterly basis when they are all in the same quarter.
set.seed(2)
( dates <- sort(Sys.Date() + sample(365, size=20)) )
# [1] "2018-06-12" "2018-07-03" "2018-07-17" "2018-07-20" "2018-07-24"
# [6] "2018-08-04" "2018-08-10" "2018-10-07" "2018-10-19" "2018-11-01"
# [11] "2018-11-29" "2018-11-30" "2018-12-12" "2019-01-28" "2019-02-10"
# [16] "2019-03-12" "2019-04-22" "2019-04-23" "2019-05-10" "2019-05-13"
Come up with the start and end dates:
( start <- lubridate::floor_date(min(dates), unit="quarter") )
# [1] "2018-04-01"
( end <- lubridate::ceiling_date(max(dates), unit="quarter") )
# [1] "2019-07-01"
We are interested in quarters:
( brks <- seq(start, end, by="quarter") )
# [1] "2018-04-01" "2018-07-01" "2018-10-01" "2019-01-01" "2019-04-01"
# [6] "2019-07-01"
cut(dates, breaks=brks)
# [1] 2018-04-01 2018-07-01 2018-07-01 2018-07-01 2018-07-01 2018-07-01
# [7] 2018-07-01 2018-10-01 2018-10-01 2018-10-01 2018-10-01 2018-10-01
# [13] 2018-10-01 2019-01-01 2019-01-01 2019-01-01 2019-04-01 2019-04-01
# [19] 2019-04-01 2019-04-01
# Levels: 2018-04-01 2018-07-01 2018-10-01 2019-01-01 2019-04-01
If you don't need to align on calendar quarters -- just grouping data into three months at a time -- then you can instead do:
( start_m <- lubridate::floor_date(min(dates), unit="month") )
# [1] "2018-06-01"
( end_m <- lubridate::ceiling_date(max(dates) + 93L, unit="month") )
# [1] "2019-09-01"
( brks_m <- seq(start_m, end_m, by="quarter") )
# [1] "2018-06-01" "2018-09-01" "2018-12-01" "2019-03-01" "2019-06-01"
# [6] "2019-09-01"
(The magic 93L
is to ensure we have at least another quarter outside of the current month, only necessary because ceiling(month)
may not go far enough to capture three months from the previous custom quarter. Generating too many breaks is not a bad thing, the extras will just go unused.)
Upvotes: 1