Rnout
Rnout

Reputation: 29

R Create new rows in data frame from given dates between two time points

I have a df with start dates and end dates of some process. Replication data:

ID1 <- c("AUT","AUT","AUT","BEL","BEL","BEL")
start_date <- c("2008-12-02", "2013-12-16", "2016-05-17", "2007-06-10", "2007-12-21", "2008-03-20")
start_date <- as.Date(start_date, "%Y-%m-%d")
end_date <- c("2013-12-15", "2016-05-16", "2017-11-30", "2007-12-20", "2008-03-19", "2008-12-29")
end_date <- as.Date(end_date, "%Y-%m-%d")
ID2 <- 1:6
df <- data.frame(ID1, as.character(start_date), as.character(end_date), ID2)

Which looks like:

   ID1       start_date     end_date       ID2
1  AUT      2008-12-02     2013-12-15      1     
2  AUT      2013-12-16     2016-05-16      2     
3  AUT      2016-05-17     2017-11-30      3     
4  BEL      2007-06-10     2007-12-20      4     
5  BEL      2007-12-21     2008-03-19      5     
6  BEL      2008-03-20     2008-12-29      6

I want to add a new column and rows for every year in the given periods to this df: the column should be year, the value of which depends on whether the first day of the year (20XX-01-01) is inside or outside the process bounds. What i'd like to see is:

    ID1       start_date     end_date       ID2    year
1   AUT      2008-12-02     2013-12-15      1     2009
2   AUT      2008-12-02     2013-12-15      1     2010
2   AUT      2008-12-02     2013-12-15      1     2011
4   AUT      2008-12-02     2013-12-15      1     2012
5   AUT      2008-12-02     2013-12-15      1     2013
6   AUT      2013-12-16     2016-05-16      2     2014
7   AUT      2013-12-16     2016-05-16      2     2015
8   AUT      2013-12-16     2016-05-16      2     2016
9   AUT      2016-05-17     2017-11-30      3     2017
10  BEL      2007-06-10     2007-12-20      4     NA
11  BEL      2007-12-21     2008-03-19      5     2008
12  BEL      2008-03-20     2008-12-29      6     NA

Edit: minor suggested changes to enhance clarity

Upvotes: 1

Views: 161

Answers (1)

amonk
amonk

Reputation: 1795

So I modified the initial code slightly :

library(lubridate)

library(data.table)

ID1 <- c("AUT","AUT","AUT","BEL","BEL","BEL")

start_date <- ymd("2008-12-02", "2013-12-16", "2016-05-17", "2007-06-10", "2007-12-21", "2008-03-20")

end_date <- ymd("2013-12-15", "2016-05-16", "2017-11-30", "2007-12-20", "2008-03-19", "2008-12-29")

ID2 <- 1:6

df <- data.table(ID1, start_date, end_date, ID2)

df[,yearDiff:=year(end_date)-year(start_date)]

df<-df[,cbind(.SD,year=(year(start_date)+1):year(end_date)),by="ID2"]

df[,dateInterval:=interval(df$start_date,df$end_date)]

df[,IsYearWithinDate:=(ymd(paste0(year,"01","01",sep="-"))%within% dateInterval)]

df[,.(ID1,start_date,end_date,ID2,year,IsYearWithinDate)]

resulting in (via df[,.(ID1,start_date,end_date,ID2,year):

   ID1 start_date   end_date ID2 year
 1: AUT 2008-12-02 2013-12-15   1 2009
 2: AUT 2008-12-02 2013-12-15   1 2010
 3: AUT 2008-12-02 2013-12-15   1 2011
 4: AUT 2008-12-02 2013-12-15   1 2012
 5: AUT 2008-12-02 2013-12-15   1 2013
 6: AUT 2013-12-16 2016-05-16   2 2014
 7: AUT 2013-12-16 2016-05-16   2 2015
 8: AUT 2013-12-16 2016-05-16   2 2016
 9: AUT 2016-05-17 2017-11-30   3 2017
10: BEL 2007-06-10 2007-12-20   4 2008
11: BEL 2007-06-10 2007-12-20   4 2007
12: BEL 2007-12-21 2008-03-19   5 2008
13: BEL 2008-03-20 2008-12-29   6 2009
14: BEL 2008-03-20 2008-12-29   6 2008

Upvotes: 1

Related Questions