Reputation: 375
I have a df that looks like this
ID FINAL_DT START_DT
23 NA 2020-03-20
25 NA 2020-04-10
29 2020-02-02 2020-01-23
30 NA 2020-01-02
What I would like to do is for each ID add a row for every month starting from START_DT and ending at whatever comes first FINAL_DT or the current date. Expected output would be the follow:
ID FINAL_DT START_DT ACTIVE_MONTH
23 NA 2020-03-20 2020-03
23 NA NA 2020-04
23 NA NA 2020-05
25 NA 2020-04-10 2020-04
25 NA NA 2020-05
29 2020-02-02 2020-01-23 2020-01
29 2020-02-02 NA 2020-02
30 NA 2020-01-02 2020-01
30 NA NA 2020-02
30 NA NA 2020-03
30 NA NA 2020-04
30 NA NA 2020-05
I have the following code which works but does not account for FINAL_DT
current_date = as.Date(Sys.Date())
enroll <- enroll %>%
group_by(ID) %>%
complete(START_DATE = seq(START_DATE, current_date, by = "month"))
I have tried the following but get an error I believe due to the NA's:
current_date = as.Date(Sys.Date())
enroll <- enroll %>%
group_by(ID) %>%
complete(START_DATE = seq(START_DATE, min(FINAL_DT,current_date), by = "month"))
The day of the month also does not matter I am not sure if it would be easier to drop that before or after.
Upvotes: 1
Views: 291
Reputation: 30504
Here is another approach. You can use floor_date
to get the first day of the month to use in your sequence of months. Then, you can include the full sequence to today's date, and filter based on FINAL_DT
. You can use as.yearmon
from zoo
if you'd like a month/year object for month.
library(zoo)
library(tidyr)
library(dplyr)
library(lubridate)
current_date = as.Date(Sys.Date())
enroll %>%
mutate(ACTIVE_MONTH = floor_date(START_DT, unit = "month")) %>%
group_by(ID) %>%
complete(ACTIVE_MONTH = seq.Date(floor_date(START_DT, unit = "month"), current_date, by = "month")) %>%
filter(ACTIVE_MONTH <= first(FINAL_DT) | is.na(first(FINAL_DT))) %>%
ungroup() %>%
mutate(ACTIVE_MONTH = as.yearmon(ACTIVE_MONTH))
Output
# A tibble: 12 x 4
ID ACTIVE_MONTH FINAL_DT START_DT
<dbl> <yearmon> <date> <date>
1 23 Mar 2020 NA 2020-03-20
2 23 Apr 2020 NA NA
3 23 May 2020 NA NA
4 25 Apr 2020 NA 2020-04-10
5 25 May 2020 NA NA
6 29 Jan 2020 2020-02-02 2020-01-23
7 29 Feb 2020 NA NA
8 30 Jan 2020 NA 2020-01-02
9 30 Feb 2020 NA NA
10 30 Mar 2020 NA NA
11 30 Apr 2020 NA NA
12 30 May 2020 NA NA
Upvotes: 1
Reputation: 24878
Here is an approach that returns rows for each MONTH
with the help of lubridate
.
library(dplyr)
library(tidyr)
library(lubridate)
current_date = as.Date(Sys.Date())
enroll %>%
mutate(MONTH = month(START_DT)) %>%
group_by(ID) %>%
complete(MONTH = seq(MONTH, min(month(FINAL_DT)[!is.na(FINAL_DT)],month(current_date))))
# A tibble: 12 x 4
# Groups: ID [4]
# ID MONTH FINAL_DT START_DT
# <int> <dbl> <fct> <fct>
# 1 23 3 NA 2020-03-20
# 2 23 4 NA NA
# 3 23 5 NA NA
# 4 25 4 NA 2020-04-10
# 5 25 5 NA NA
# 6 29 1 2020-02-02 2020-01-23
# 7 29 2 NA NA
# 8 30 1 NA 2020-01-02
# 9 30 2 NA NA
#10 30 3 NA NA
#11 30 4 NA NA
#12 30 5 NA NA
Upvotes: 1