Toros91
Toros91

Reputation: 303

How to split a record into multiple record based on start and end date R

I would try explaining my problem by taking a sample data

ID      Region      Start_Date     End_Date
 1       Reg1       27/1/2017      27/1/2017
 2       Reg1       27/2/2017      05/3/2017
 1       Reg1       24/3/2017      25/5/2017

Now I want to outcome to be something like this:

ID      Region      n_Start_Date   n_End_Date
 1       Reg1       27/1/2017      27/1/2017
 2       Reg2       27/2/2017      28/2/2017
 2       Reg2       01/3/2017      05/3/2017
 1       Reg1       24/3/2017      31/3/2017
 1       Reg1       01/4/2017      30/4/2017
 1       Reg1       01/5/2017      31/5/2017

Current Approach which I'm thinking of implementing:

I've created a dataframe which has 14 records with every month start date and end date for year 2017 and 2018 like:

Year    Month   Start of Month  End of Month
2017      1        1/1/2017      31/1/2017
2017      2        1/2/2017      28/2/2017
2017      3        1/3/2017      31/3/2017
2017      4        1/4/2017      30/4/2017
2017      5        1/5/2017      31/5/2017
2017      6        1/6/2017      30/6/2017
2017      7        1/7/2017      31/7/2017
2017      8        1/8/2017      31/8/2017
2017      9        1/9/2017      30/9/2017
2017      10       1/10/2017     31/10/2017
2017      11       1/11/2017     30/11/2017
2017      12       1/12/2017     31/12/2017
2018      1        2/12/2017     31/1/2018
2018      2        3/12/2017     28/2/2018

I've made a new column for year and month:

  1. If the start date year, Month are same as that of end date year, month then next same start and end date would be copied to the new dataframe like

    ID      Region   Start_Date     End_Date   n_Start_Date   n_End_Date
    1       Reg1      27/1/2017     27/1/2017   27/1/2017      27/1/2017
    
  2. If the Start date year, Month are not same then it appends

    ID      Region   Start_Date     End_Date   n_Start_Date   n_End_Date
    2       Reg2      27/2/2017    05/3/2017    27/2/2017      28/2/2017
    2       Reg2      27/2/2017    05/3/2017    01/3/2017      05/3/2017
    

I couldn't find any similar questions, I've gone through this link, but not useful.

if there is any better approach do let me know.

Upvotes: 3

Views: 767

Answers (1)

zacdav
zacdav

Reputation: 4671

I think I've understood what you want, if you have a date where the end date is not in the same year and month you generate a new row until it does. The generated row should start one day after and end at the end of that month.

# packages we need
library(tidyverse)
library(lubridate)

Example data

test_data <- tribble(
  ~ID, ~Region, ~Start_Date, ~End_Date,
  1L, "Reg1", "27/1/2017", "27/1/2017",
  2L, "Reg2", "27/2/2017", "05/3/2017",
  1L, "Reg1", "24/3/2017", "25/5/2017"
) %>% mutate_at(vars(Start_Date, End_Date), dmy)

Creating a function

If we make a function does what you want given any start and end we can apply it easily after.

expand_dates <- function(start, end) {

  # the number of entries we want to add
  to_add <- month(end) - month(start) 

  # Take the start date, roll it forwards until the month is equal to the end month
  start_dates <- start + months(0:to_add)

  # everything but the first start_date is rolled back to first of month
  start_dates <- c(start_dates[1],
                   rollback(start_dates[-1], roll_to_first = T))

  # end dates are just the start_dates rolled forwards to the end of the month
  # apply to all but last, thats the end date
  end_dates <- c(rollback(ceiling_date(start_dates[-length(start_dates)], unit = "months")), end)

  data.frame(start_dates = start_dates,
             end_dates = end_dates)
}

Using the function

We can just use map2 from purrr which enables us to iterate over both the start and end date. Using mutate we've added in a column which is a list. Each element in our list column is a data.frame which is output from our new function. We will use unnest to then expand our data to what is desired.

test_data %>%
  mutate(test = map2(Start_Date, End_Date, expand_dates)) %>%
  unnest()


# A tibble: 6 x 6
     ID Region Start_Date End_Date   start_dates end_dates 
  <int> <chr>  <date>     <date>     <date>      <date>    
1     1 Reg1   2017-01-27 2017-01-27 2017-01-27  2017-01-27
2     2 Reg2   2017-02-27 2017-03-05 2017-02-27  2017-02-28
3     2 Reg2   2017-02-27 2017-03-05 2017-03-01  2017-03-05
4     1 Reg1   2017-03-24 2017-05-25 2017-03-24  2017-03-31
5     1 Reg1   2017-03-24 2017-05-25 2017-04-01  2017-04-30
6     1 Reg1   2017-03-24 2017-05-25 2017-05-01  2017-05-25

Upvotes: 2

Related Questions