Reputation: 303
I would try explaining my problem by taking a sample data
ID Region Start_Date End_Date
1 Reg1 27/1/2017 27/1/2017
2 Reg1 27/2/2017 05/3/2017
1 Reg1 24/3/2017 25/5/2017
Now I want to outcome to be something like this:
ID Region n_Start_Date n_End_Date
1 Reg1 27/1/2017 27/1/2017
2 Reg2 27/2/2017 28/2/2017
2 Reg2 01/3/2017 05/3/2017
1 Reg1 24/3/2017 31/3/2017
1 Reg1 01/4/2017 30/4/2017
1 Reg1 01/5/2017 31/5/2017
Current Approach which I'm thinking of implementing:
I've created a dataframe which has 14 records with every month start date and end date for year 2017 and 2018 like:
Year Month Start of Month End of Month
2017 1 1/1/2017 31/1/2017
2017 2 1/2/2017 28/2/2017
2017 3 1/3/2017 31/3/2017
2017 4 1/4/2017 30/4/2017
2017 5 1/5/2017 31/5/2017
2017 6 1/6/2017 30/6/2017
2017 7 1/7/2017 31/7/2017
2017 8 1/8/2017 31/8/2017
2017 9 1/9/2017 30/9/2017
2017 10 1/10/2017 31/10/2017
2017 11 1/11/2017 30/11/2017
2017 12 1/12/2017 31/12/2017
2018 1 2/12/2017 31/1/2018
2018 2 3/12/2017 28/2/2018
I've made a new column for year and month:
If the start date year, Month are same as that of end date year, month then next same start and end date would be copied to the new dataframe like
ID Region Start_Date End_Date n_Start_Date n_End_Date
1 Reg1 27/1/2017 27/1/2017 27/1/2017 27/1/2017
If the Start date year, Month are not same then it appends
ID Region Start_Date End_Date n_Start_Date n_End_Date
2 Reg2 27/2/2017 05/3/2017 27/2/2017 28/2/2017
2 Reg2 27/2/2017 05/3/2017 01/3/2017 05/3/2017
I couldn't find any similar questions, I've gone through this link, but not useful.
if there is any better approach do let me know.
Upvotes: 3
Views: 767
Reputation: 4671
I think I've understood what you want, if you have a date where the end date is not in the same year and month you generate a new row until it does. The generated row should start one day after and end at the end of that month.
# packages we need
library(tidyverse)
library(lubridate)
test_data <- tribble(
~ID, ~Region, ~Start_Date, ~End_Date,
1L, "Reg1", "27/1/2017", "27/1/2017",
2L, "Reg2", "27/2/2017", "05/3/2017",
1L, "Reg1", "24/3/2017", "25/5/2017"
) %>% mutate_at(vars(Start_Date, End_Date), dmy)
If we make a function does what you want given any start and end we can apply it easily after.
expand_dates <- function(start, end) {
# the number of entries we want to add
to_add <- month(end) - month(start)
# Take the start date, roll it forwards until the month is equal to the end month
start_dates <- start + months(0:to_add)
# everything but the first start_date is rolled back to first of month
start_dates <- c(start_dates[1],
rollback(start_dates[-1], roll_to_first = T))
# end dates are just the start_dates rolled forwards to the end of the month
# apply to all but last, thats the end date
end_dates <- c(rollback(ceiling_date(start_dates[-length(start_dates)], unit = "months")), end)
data.frame(start_dates = start_dates,
end_dates = end_dates)
}
We can just use map2
from purrr
which enables us to iterate over both the start and end date. Using mutate
we've added in a column which is a list. Each element in our list column is a data.frame which is output from our new function. We will use unnest
to then expand our data to what is desired.
test_data %>%
mutate(test = map2(Start_Date, End_Date, expand_dates)) %>%
unnest()
# A tibble: 6 x 6
ID Region Start_Date End_Date start_dates end_dates
<int> <chr> <date> <date> <date> <date>
1 1 Reg1 2017-01-27 2017-01-27 2017-01-27 2017-01-27
2 2 Reg2 2017-02-27 2017-03-05 2017-02-27 2017-02-28
3 2 Reg2 2017-02-27 2017-03-05 2017-03-01 2017-03-05
4 1 Reg1 2017-03-24 2017-05-25 2017-03-24 2017-03-31
5 1 Reg1 2017-03-24 2017-05-25 2017-04-01 2017-04-30
6 1 Reg1 2017-03-24 2017-05-25 2017-05-01 2017-05-25
Upvotes: 2