How to split a record into multiple record based on start and end date R

Question

I would try explaining my problem by taking a sample data

ID      Region      Start_Date     End_Date
 1       Reg1       27/1/2017      27/1/2017
 2       Reg1       27/2/2017      05/3/2017
 1       Reg1       24/3/2017      25/5/2017

Now I want to outcome to be something like this:

ID      Region      n_Start_Date   n_End_Date
 1       Reg1       27/1/2017      27/1/2017
 2       Reg2       27/2/2017      28/2/2017
 2       Reg2       01/3/2017      05/3/2017
 1       Reg1       24/3/2017      31/3/2017
 1       Reg1       01/4/2017      30/4/2017
 1       Reg1       01/5/2017      31/5/2017

Current Approach which I'm thinking of implementing:

I've created a dataframe which has 14 records with every month start date and end date for year 2017 and 2018 like:

Year    Month   Start of Month  End of Month
2017      1        1/1/2017      31/1/2017
2017      2        1/2/2017      28/2/2017
2017      3        1/3/2017      31/3/2017
2017      4        1/4/2017      30/4/2017
2017      5        1/5/2017      31/5/2017
2017      6        1/6/2017      30/6/2017
2017      7        1/7/2017      31/7/2017
2017      8        1/8/2017      31/8/2017
2017      9        1/9/2017      30/9/2017
2017      10       1/10/2017     31/10/2017
2017      11       1/11/2017     30/11/2017
2017      12       1/12/2017     31/12/2017
2018      1        2/12/2017     31/1/2018
2018      2        3/12/2017     28/2/2018

I've made a new column for year and month:

If the start date year, Month are same as that of end date year, month then next same start and end date would be copied to the new dataframe like
```
ID      Region   Start_Date     End_Date   n_Start_Date   n_End_Date
1       Reg1      27/1/2017     27/1/2017   27/1/2017      27/1/2017
```

If the Start date year, Month are not same then it appends

ID      Region   Start_Date     End_Date   n_Start_Date   n_End_Date
2       Reg2      27/2/2017    05/3/2017    27/2/2017      28/2/2017
2       Reg2      27/2/2017    05/3/2017    01/3/2017      05/3/2017

I couldn't find any similar questions, I've gone through this link, but not useful.

if there is any better approach do let me know.

zacdav · Accepted Answer

I think I've understood what you want, if you have a date where the end date is not in the same year and month you generate a new row until it does. The generated row should start one day after and end at the end of that month.

# packages we need
library(tidyverse)
library(lubridate)

Example data

test_data <- tribble(
  ~ID, ~Region, ~Start_Date, ~End_Date,
  1L, "Reg1", "27/1/2017", "27/1/2017",
  2L, "Reg2", "27/2/2017", "05/3/2017",
  1L, "Reg1", "24/3/2017", "25/5/2017"
) %>% mutate_at(vars(Start_Date, End_Date), dmy)

Creating a function

If we make a function does what you want given any start and end we can apply it easily after.

expand_dates <- function(start, end) {

  # the number of entries we want to add
  to_add <- month(end) - month(start) 

  # Take the start date, roll it forwards until the month is equal to the end month
  start_dates <- start + months(0:to_add)

  # everything but the first start_date is rolled back to first of month
  start_dates <- c(start_dates[1],
                   rollback(start_dates[-1], roll_to_first = T))

  # end dates are just the start_dates rolled forwards to the end of the month
  # apply to all but last, thats the end date
  end_dates <- c(rollback(ceiling_date(start_dates[-length(start_dates)], unit = "months")), end)

  data.frame(start_dates = start_dates,
             end_dates = end_dates)
}

Using the function

We can just use map2 from purrr which enables us to iterate over both the start and end date. Using mutate we've added in a column which is a list. Each element in our list column is a data.frame which is output from our new function. We will use unnest to then expand our data to what is desired.

test_data %>%
  mutate(test = map2(Start_Date, End_Date, expand_dates)) %>%
  unnest()


# A tibble: 6 x 6
     ID Region Start_Date End_Date   start_dates end_dates 
                         
1     1 Reg1   2017-01-27 2017-01-27 2017-01-27  2017-01-27
2     2 Reg2   2017-02-27 2017-03-05 2017-02-27  2017-02-28
3     2 Reg2   2017-02-27 2017-03-05 2017-03-01  2017-03-05
4     1 Reg1   2017-03-24 2017-05-25 2017-03-24  2017-03-31
5     1 Reg1   2017-03-24 2017-05-25 2017-04-01  2017-04-30
6     1 Reg1   2017-03-24 2017-05-25 2017-05-01  2017-05-25

How to split a record into multiple record based on start and end date R

Answers (1)

Example data

Creating a function

Using the function

Related Questions