user3249770
user3249770

Reputation: 375

R inserting rows between dates by group based on second column

I have a df that looks like this

ID    FINAL_DT      START_DT

23       NA        2020-03-20
25       NA        2020-04-10
29   2020-02-02    2020-01-23
30       NA        2020-01-02

What I would like to do is for each ID add a row for every month starting from START_DT and ending at whatever comes first FINAL_DT or the current date. Expected output would be the follow:

ID    FINAL_DT      START_DT      ACTIVE_MONTH

23       NA        2020-03-20       2020-03
23       NA            NA           2020-04
23       NA            NA           2020-05
25       NA        2020-04-10       2020-04
25       NA            NA           2020-05
29   2020-02-02    2020-01-23       2020-01
29   2020-02-02        NA           2020-02
30       NA        2020-01-02       2020-01
30       NA            NA           2020-02
30       NA            NA           2020-03 
30       NA            NA           2020-04
30       NA            NA           2020-05

I have the following code which works but does not account for FINAL_DT

current_date = as.Date(Sys.Date())

enroll <- enroll %>%
          group_by(ID) %>% 
          complete(START_DATE = seq(START_DATE, current_date, by = "month"))

I have tried the following but get an error I believe due to the NA's:

current_date = as.Date(Sys.Date())

enroll <- enroll %>%
          group_by(ID) %>% 
          complete(START_DATE = seq(START_DATE, min(FINAL_DT,current_date), by = "month"))

The day of the month also does not matter I am not sure if it would be easier to drop that before or after.

Upvotes: 1

Views: 291

Answers (2)

Ben
Ben

Reputation: 30504

Here is another approach. You can use floor_date to get the first day of the month to use in your sequence of months. Then, you can include the full sequence to today's date, and filter based on FINAL_DT. You can use as.yearmon from zoo if you'd like a month/year object for month.

library(zoo)
library(tidyr)
library(dplyr)
library(lubridate)

current_date = as.Date(Sys.Date())

enroll %>%
  mutate(ACTIVE_MONTH = floor_date(START_DT, unit = "month")) %>%
  group_by(ID) %>%
  complete(ACTIVE_MONTH = seq.Date(floor_date(START_DT, unit = "month"), current_date, by = "month")) %>%
  filter(ACTIVE_MONTH <= first(FINAL_DT) | is.na(first(FINAL_DT))) %>%
  ungroup() %>%
  mutate(ACTIVE_MONTH = as.yearmon(ACTIVE_MONTH))

Output

# A tibble: 12 x 4
      ID ACTIVE_MONTH FINAL_DT   START_DT  
   <dbl> <yearmon>    <date>     <date>    
 1    23 Mar 2020     NA         2020-03-20
 2    23 Apr 2020     NA         NA        
 3    23 May 2020     NA         NA        
 4    25 Apr 2020     NA         2020-04-10
 5    25 May 2020     NA         NA        
 6    29 Jan 2020     2020-02-02 2020-01-23
 7    29 Feb 2020     NA         NA        
 8    30 Jan 2020     NA         2020-01-02
 9    30 Feb 2020     NA         NA        
10    30 Mar 2020     NA         NA        
11    30 Apr 2020     NA         NA        
12    30 May 2020     NA         NA

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24878

Here is an approach that returns rows for each MONTH with the help of lubridate.

library(dplyr)
library(tidyr)
library(lubridate)
current_date = as.Date(Sys.Date())

enroll %>%
  mutate(MONTH = month(START_DT)) %>%
  group_by(ID) %>% 
  complete(MONTH = seq(MONTH, min(month(FINAL_DT)[!is.na(FINAL_DT)],month(current_date))))
# A tibble: 12 x 4
# Groups:   ID [4]
#      ID MONTH FINAL_DT   START_DT  
#   <int> <dbl> <fct>      <fct>     
# 1    23     3 NA         2020-03-20
# 2    23     4 NA         NA        
# 3    23     5 NA         NA        
# 4    25     4 NA         2020-04-10
# 5    25     5 NA         NA        
# 6    29     1 2020-02-02 2020-01-23
# 7    29     2 NA         NA        
# 8    30     1 NA         2020-01-02
# 9    30     2 NA         NA        
#10    30     3 NA         NA        
#11    30     4 NA         NA        
#12    30     5 NA         NA

Upvotes: 1

Related Questions