Reputation: 83
I have a column of start and stop dates, and I need to extract the latest (most recent) stop date in order to calculate duration. (earliest start date - latest stop date) Unfortunately, the dates in the last column are not necessarily the latest dates. So, I would have to go by row and compare the dates to figure out the latest date. The other caveat is that not all columns will have dates.
Here's an example column of dates:
pacman::p_load(tibble, lubridate)
start_1 <- as_tibble(sample(seq(ymd("1999/01/01"), ymd("2000/01/01"), by="day"), 5))
stop_1 <- as_tibble(sample(seq(ymd("2000/01/01"), ymd("2001/01/01"), by="day"), 5))
stop_2 <- as_tibble(c(ymd("2000/03/05"), ymd("2000/11/15"), ymd("2000/07/22"), ymd("2000/05/05"), NA))
stop_3 <- as_tibble(c(ymd("2000/12/12"), ymd("2000/02/09"), NA, NA, NA))
dat <- cbind(start_1, stop_1, stop_2, stop_3)
I really have no idea how to go about this, and would appreciate any help.
Thank you!
Upvotes: 1
Views: 523
Reputation: 66
One option is to use apply():
durs = as.Date(apply(dat[,c(2:ncol(dat))],1,max,na.rm=T))-dat[,1]
This assumes that the first column contains the start date and all columns thereafter contain possible stop dates.
Upvotes: 5
Reputation: 24722
First fix the column names and then use rowwise()
with c_across()
.
colnames(dat) = c("start_1", "stop_1", "stop_2", "stop_3")
dat %>%
rowwise() %>%
mutate(LastDate=max(c_across(starts_with("stop")), na.rm=T),
Duration = LastDate-start_1)
start_1 stop_1 stop_2 stop_3 LastDate Duration
<date> <date> <date> <date> <date> <drtn>
1 1999-10-20 2000-11-12 2000-03-05 2000-12-12 2000-12-12 419 days
2 1999-04-30 2000-05-05 2000-11-15 2000-02-09 2000-11-15 565 days
3 1999-05-01 2000-04-01 2000-07-22 NA 2000-07-22 448 days
4 1999-04-17 2000-08-23 2000-05-05 NA 2000-08-23 494 days
5 1999-04-10 2000-04-02 NA NA 2000-04-02 358 days
Upvotes: 1