user63230
user63230

Reputation: 4698

loop through date intervals to create multiple variables

Lets say I have the following dataset:

    library(lubridate)
    library(tidyverse)
    df <- data.frame(date1 = c("2011-09-18", "2013-03-06", "2013-08-08"),
                     date2 = c("2012-02-18", "2014-03-06", "2015-02-03"))
    df$date1 <- as.Date(parse_date_time(df$date1, "ymd"))
    df$date2 <- as.Date(parse_date_time(df$date2, "ymd"))
    df
    #        date1      date2
    # 1 2011-09-18 2012-02-18
    # 2 2013-03-06 2014-03-06
    # 3 2013-08-08 2015-02-03

I want to create indicator variables to tell if a year was at all associated with the interval between the dates. For example, the 3rd observation has 2013, 2014, 2015 associated with it. Additionally I want to create variables if a particular date is within the interval, e.g. 1st of April for each year.

Desired output:

       date1      date2 y_2011 y_2012 y_2013 y_2014 y_2015 y_1st_2011 y_1st_2012 y_1st_2013 y_1st_2014 y_1st_2015
1 2011-09-18 2012-02-18      1      1      0      0      0          0          0          0          0          0
2 2013-03-06 2014-03-06      0      0      1      1      0          0          0          1          0          0
3 2013-08-08 2015-02-03      0      0      1      1      1          0          0          0          1          0

Manually I could do this by something like this:

#is 2011 associated with dates
df$y_2011 <- if_else(year(df$date1) == 2011, 1, 0, as.numeric(NA))
#is 2014 associated with dates
df$y_2014 <- if_else(between(2014, year(df$date1), year(df$date2)), 1, 0, as.numeric(NA))

#is particular date (2014-04-01) within interval
df$y_1st_2014 <- if_else(between("2014-04-01", df$date1, df$date2), 1, 0, as.numeric(NA))

I want to put it into a function though so its more automated:

#particular date, 1st of April of each year
b <- seq(as.Date("2011-04-01"), by = "year", length.out = 5)
b
#[1] "2011-01-01" "2012-01-01" "2013-01-01" "2014-01-01" "2015-01-01"

#for year
a <- c(2011:2015)
[1] 2011 2012 2013 2014 2015
df[paste0("y_", a)] <- lapply(a, function(x) if_else(between(a, 
year(df$date1), year(df$date2)), 1, 0, as.numeric(NA)))

Any suggestions? Preferably with a dplyr/purrr solution.

refs: Test if date occurs in multiple date ranges with R

Check if a date is within an interval in R

Loop to add new columns with ifelse

Upvotes: 1

Views: 766

Answers (1)

Dave2e
Dave2e

Reputation: 24109

Here is a solution to create the matrix of years associated with the date range:

library(lubridate)
library(tidyr)
library(dplyr)
df <- data.frame(date1 = c("2011-09-18", "2013-03-06", "2013-08-08"),
                 date2 = c("2012-02-18", "2014-03-06", "2015-02-03"))
df$date1 <- as.Date(parse_date_time(df$date1, "ymd"))
df$date2 <- as.Date(parse_date_time(df$date2, "ymd"))

#identify the years associated with each row.
df$year<-sapply(1:nrow(df), function(i){
  paste(seq(as.numeric(format(df$date1[i], "%Y")), 
            as.numeric(format(df$date2[i], "%Y"))), collapse = ",")})

#separate and convert to wide format
df %>% separate_rows( year, sep=",") %>% 
  mutate(value=1) %>%
  spread(key=year, value=value, fill=0)

#        date1      date2 2011 2012 2013 2014 2015
# 1 2011-09-18 2012-02-18    1    1    0    0    0
# 2 2013-03-06 2014-03-06    0    0    1    1    0
# 3 2013-08-08 2015-02-03    0    0    1    1    1

Using the between function is a viable option to test if a particular date is within the range.

Upvotes: 2

Related Questions