mkpcr
mkpcr

Reputation: 511

Create single year variables corresponding to a range of dates in R

I would like to be able to take two variables representing a starting date and an ending date and create variables indicating which years are covered over the range of those two dates.

What I have:

df1 <- data.frame(ID = c("A", "B", "C"),
                 Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"), 
                 End_Date = c("6/25/2009","11/2/2006", "6/9/2011"))

What I want:

df2 <- data.frame(ID = c("A", "B", "C"),
                 Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"), 
                 End_Date = c("6/25/2009","11/2/2006", "6/9/2011"),
                 y2004 = c(1, 0, 0), 
                 y2005 = c(1, 1, 0), 
                 y2006 = c(1, 1, 0), 
                 y2007 = c(1, 0, 0), 
                 y2008 = c(1, 0, 1), 
                 y2009 = c(0, 0, 1), 
                 y2010 = c(0, 0, 1), 
                 y2011 = c(0, 0, 1))

As above, each new year variable indicates whether or not the year is captured in the range of the two date variables "Start_Date" and "End_Date".

Any ideas would be greatly appreciated. Thanks in advance.

Upvotes: 0

Views: 449

Answers (1)

akrun
akrun

Reputation: 887088

One method is to pivot to 'long' format, extract the year part after converting to Date class, then get the seq (:) from the first to last grouped by 'ID' and reshape back to 'wide', then join with the original data by 'ID'

library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
df1 %>%
     pivot_longer(cols = -ID) %>%
     group_by(ID) %>% 
     summarise(year = str_c('y', year(mdy(value)[1]):year(mdy(value)[2])),
            n = 1, .groups = 'drop') %>% 
     pivot_wider(names_from = year, values_from = n, values_fill = 0) %>% 
     left_join(df1, .)

-output

#   ID Start_Date  End_Date y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011
#1  A   3/5/2004 6/25/2009     1     1     1     1     1     1     0     0
#2  B  8/22/2005 11/2/2006     0     1     1     0     0     0     0     0
#3  C   4/8/2008  6/9/2011     0     0     0     0     1     1     1     1

Upvotes: 1

Related Questions