boshek
boshek

Reputation: 4406

Split a rows into two when a date range spans a change in calendar year

I am trying to figure out how to add a row when a date range spans a calendar year. Below is a minimal reprex:

I have a date frame like this:

have <- data.frame(
  from = c(as.Date('2018-12-15'), as.Date('2019-12-20'), as.Date('2019-05-13')), 
  to = c(as.Date('2019-06-20'), as.Date('2020-01-25'), as.Date('2019-09-10'))
)
have
#>         from         to
#> 1 2018-12-15 2019-06-20
#> 2 2019-12-20 2020-01-25
#> 3 2019-05-13 2019-09-10

I want a data.frame that splits into two rows when to and from span a calendar year.

want <- data.frame(
  from = c(as.Date('2018-12-15'), as.Date('2019-01-01'), as.Date('2019-12-20'), as.Date('2020-01-01'), as.Date('2019-05-13')), 
  to = c(as.Date('2018-12-31'), as.Date('2019-06-20'), as.Date('2019-12-31'), as.Date('2020-01-25'), as.Date('2019-09-10'))
)
want
#>         from         to
#> 1 2018-12-15 2018-12-31
#> 2 2019-01-01 2019-06-20
#> 3 2019-12-20 2019-12-31
#> 4 2020-01-01 2020-01-25
#> 5 2019-05-13 2019-09-10

I am wanting to do this because for a particular row, I want to know how many days are in each year.

want$time_diff_by_year <- difftime(want$to, want$from)

Created on 2020-05-15 by the reprex package (v0.3.0)

Any base R, tidyverse solutions would be much appreciated.

Upvotes: 2

Views: 574

Answers (1)

Ben
Ben

Reputation: 30474

You can determine the additional years needed for your date intervals with map2, then unnest to create additional rows for each year.

Then, you can identify date intervals of intersections between partial years and a full calendar year. This will keep the partial years starting Jan 1 or ending Dec 31 for a given year.

library(tidyverse)
library(lubridate)

have %>%
  mutate(date_int = interval(from, to),
         year = map2(year(from), year(to), seq)) %>%
  unnest(year) %>%
  mutate(year_int = interval(as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))),
         year_sect = intersect(date_int, year_int),
         from_new = as.Date(int_start(year_sect)),
         to_new = as.Date(int_end(year_sect))) %>%
  select(from_new, to_new)

Output

# A tibble: 5 x 2
  from_new   to_new    
  <date>     <date>    
1 2018-12-15 2018-12-31
2 2019-01-01 2019-06-20
3 2019-12-20 2019-12-31
4 2020-01-01 2020-01-25
5 2019-05-13 2019-09-10

Upvotes: 2

Related Questions