Reputation: 783
I have this data frame with id
, id_month_year
, start_date
, end_date
, daylight_begin
, and daylight_end
, where id
is the ID for the record, id_month_year
is just a string of the record's ID, month and year. start_date
and end_date
are start and end dates of the record. daylight_begin
and daylight_end
are the beginning and ending dates of daylight savings time.
df <- data.frame(
id = c(rep("A01", 5), rep("B01", 4)),
id_month_year = c("A01 Jan 2023", "A01 March 2023", "A01 November 2022", "A01 June 2022",
"A01 March 2022", "B02 March 2022", "B02 November 2022", "B02 March 2023", "B02 May 2022"),
start_date = c("2023-01-04", "2023-03-01", "2022-11-01", "2022-06-05",
"2022-03-02", "2022-03-04", "2022-11-05", "2023-03-02", "2022-05-03"),
end_date = c("2023-01-25", "2023-03-24", "2022-11-27", "2022-06-24",
"2022-03-29", "2022-03-16", "2022-11-28", "2023-03-30", "2022-05-30")
) %>%
mutate(across(c(start_date, end_date), ~as.Date(.x)),
daylight_begin = if_else(year(start_date) == "2022", as_date("2022-03-13"), as_date("2023-03-12")),
daylight_end = if_else(year(end_date) == "2022", as_date("2022-11-06"), as_date("2023-11-05"))
)
For the start and end dates that are within the daylight savings date ranges, I want to break the date interval using the daylight savings dates as cutoff dates. i.e. if either daylight_begin
or daylight_end
dates are within the original start_date
and end_date
range, then I want to break the date range using the appropriate daylight_begin
or daylight_end
as a cutoff point.
The two years have different daylight_begin
and daylight_end
dates:
daylight_begin
= "2022-03-13"; daylight_end
= "2022-11-06"daylight_begin
= "2023-03-12"; daylight_end
= "2023-11-05"For example, for id_month_year
of "A01 March 2023," the start date is "2023-03-01" and the end date is "2023-03-24". Since this date range includes daylight_begin
of "2023-03-12", the new date ranges for "A01 March 2023" would be from:
start_date
= "2023-03-01" to end_date
= "2023-03-11"start_date
= "2023-03-12" to end_date
= "2023-03-24"I also want to duplicate the id
and id_month_year
for the row that needs to be broken down. So essentially I want to change the row from this
to that.
After breaking all the date ranges by daylight savings dates, I want my data frame to look like this:
new_df <- data.frame(
id = c(rep("A01", 8), rep("B01", 7)),
id_month_year = c("A01 Jan 2023", rep("A01 March 2023", 2), rep("A01 November 2022", 2), "A01 June 2022",
rep("A01 March 2022",2), rep("B02 March 2022",2), rep("B02 November 2022",2), rep("B02 March 2023",2), "B02 May 2022"),
start_date = c("2023-01-04", "2023-03-01", "2023-03-12", "2022-11-01", "2022-11-06", "2022-06-05",
"2022-03-02", "2022-03-13", "2022-03-04", "2022-03-13", "2022-11-05", "2022-11-06", "2023-03-02", "2023-03-12","2022-05-03"),
end_date = c("2023-01-25","2023-03-11", "2023-03-24", "2022-11-05", "2022-11-27", "2022-06-24",
"2022-03-12", "2022-03-29", "2022-03-12", "2022-03-16", "2022-11-05", "2022-11-28", "2023-03-11", "2023-03-30", "2022-05-30")
) %>%
mutate(across(c(start_date, end_date), ~as.Date(.x)),
daylight_begin = if_else(year(start_date) == "2022", as_date("2022-03-13"), as_date("2023-03-12")),
daylight_end = if_else(year(end_date) == "2022", as_date("2022-11-06"), as_date("2023-11-05"))
)
> new_df
id id_month_year start_date end_date daylight_begin daylight_end
1 A01 A01 Jan 2023 2023-01-04 2023-01-25 2023-03-12 2023-11-05
2 A01 A01 March 2023 2023-03-01 2023-03-11 2023-03-12 2023-11-05
3 A01 A01 March 2023 2023-03-12 2023-03-24 2023-03-12 2023-11-05
4 A01 A01 November 2022 2022-11-01 2022-11-05 2022-03-13 2022-11-06
5 A01 A01 November 2022 2022-11-06 2022-11-27 2022-03-13 2022-11-06
6 A01 A01 June 2022 2022-06-05 2022-06-24 2022-03-13 2022-11-06
7 A01 A01 March 2022 2022-03-02 2022-03-12 2022-03-13 2022-11-06
8 A01 A01 March 2022 2022-03-13 2022-03-29 2022-03-13 2022-11-06
9 B01 B02 March 2022 2022-03-04 2022-03-12 2022-03-13 2022-11-06
10 B01 B02 March 2022 2022-03-13 2022-03-16 2022-03-13 2022-11-06
11 B01 B02 November 2022 2022-11-05 2022-11-05 2022-03-13 2022-11-06
12 B01 B02 November 2022 2022-11-06 2022-11-28 2022-03-13 2022-11-06
13 B01 B02 March 2023 2023-03-02 2023-03-11 2023-03-12 2023-11-05
14 B01 B02 March 2023 2023-03-12 2023-03-30 2023-03-12 2023-11-05
15 B01 B02 May 2022 2022-05-03 2022-05-30 2022-03-13 2022-11-06
Upvotes: 0
Views: 52
Reputation: 160607
Here's a two-step:
library(dplyr)
df <- df %>%
mutate(
split_left = start_date < daylight_begin & between(end_date, daylight_begin, daylight_end),
split_right = between(start_date, daylight_begin, daylight_end) & end_date > daylight_end
)
bind_rows(
filter(df, !split_left, !split_right),
filter(df, split_right) %>% mutate(end_date = daylight_end - 1),
filter(df, split_right) %>% mutate(start_date = daylight_end),
filter(df, split_left) %>% mutate(start_date = daylight_begin),
filter(df, split_left) %>% mutate(end_date = daylight_begin - 1)
) %>%
select(-split_left, -split_right)
# id id_month_year start_date end_date daylight_begin daylight_end
# 1 A01 A01 Jan 2023 2023-01-04 2023-01-25 2023-03-12 2023-11-05
# 2 A01 A01 June 2022 2022-06-05 2022-06-24 2022-03-13 2022-11-06
# 3 B01 B02 May 2022 2022-05-03 2022-05-30 2022-03-13 2022-11-06
# 4 A01 A01 November 2022 2022-11-01 2022-11-05 2022-03-13 2022-11-06
# 5 B01 B02 November 2022 2022-11-05 2022-11-05 2022-03-13 2022-11-06
# 6 A01 A01 November 2022 2022-11-06 2022-11-27 2022-03-13 2022-11-06
# 7 B01 B02 November 2022 2022-11-06 2022-11-28 2022-03-13 2022-11-06
# 8 A01 A01 March 2023 2023-03-12 2023-03-24 2023-03-12 2023-11-05
# 9 A01 A01 March 2022 2022-03-13 2022-03-29 2022-03-13 2022-11-06
# 10 B01 B02 March 2022 2022-03-13 2022-03-16 2022-03-13 2022-11-06
# 11 B01 B02 March 2023 2023-03-12 2023-03-30 2023-03-12 2023-11-05
# 12 A01 A01 March 2023 2023-03-01 2023-03-11 2023-03-12 2023-11-05
# 13 A01 A01 March 2022 2022-03-02 2022-03-12 2022-03-13 2022-11-06
# 14 B01 B02 March 2022 2022-03-04 2022-03-12 2022-03-13 2022-11-06
# 15 B01 B02 March 2023 2023-03-02 2023-03-11 2023-03-12 2023-11-05
Upvotes: 1