Reputation: 1644
I have a data similar to what we see below. In the actual data, 1 PATIENT_ID
may have more than 2 rows.
df
# PATIENT_ID DRUG_CLASS start end duration
# <dbl> <chr> <date> <date> <drtn>
# 1 SU 2014-03-05 2016-08-04 883 days
# 1 DPP4 2016-07-22 2016-08-26 35 days
# 2 SU 2014-01-07 2017-06-23 1263 days
# 2 DPP4 2015-03-04 2017-12-27 1029 days
# 3 SU 2014-01-09 2014-12-05 330 days
# 3 SU 2015-01-08 2016-01-07 364 days
What I want to do is to take the first row for every patient, but if the interval overlaps with the next interval, in the next row, then we take the end
date to be the start
date of the subsequent row - 1
. Example:
end
date is greater than the start
date of the next row, i.e. "2016-08-04"
> "2016-07-22"
, then the new end
date should be "2016-07-21"
end
date as "2015-03-04" - 1 = "2015-03-03"
Final Data:
df
# PATIENT_ID DRUG_CLASS start end
# <dbl> <chr> <date> <date>
# 1 SU 2014-03-05 2016-07-21
# 2 SU 2014-01-07 2015-03-03
# 3 SU 2014-01-09 2014-12-05
Code:
df <- structure(list(PATIENT_ID = c(1, 1, 2, 2, 3, 3), DRUG_CLASS = c("SU",
"DPP4", "SU", "DPP4", "SU", "SU"), start = structure(c(16134,
17004, 16077, 16498, 16079, 16443), class = "Date"), end = structure(c(17017,
17039, 17340, 17527, 16409, 16807), class = "Date"), duration = structure(c(883,
35, 1263, 1029, 330, 364), class = "difftime", units = "days")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 119
Reputation: 17648
Perfect for tidyverse
library(tidyverse)
df %>%
group_by(PATIENT_ID) %>%
summarise(DRUG_CLASS = toString(unique(DRUG_CLASS)),
end= if_else(first(end) > last(start) , max(start)-1, min(end)),
start = first(start)) %>%
select(1:2, start, end)
# A tibble: 3 x 4
PATIENT_ID DRUG_CLASS start end
<dbl> <chr> <date> <date>
1 1 SU, DPP4 2014-03-05 2016-07-21
2 2 SU, DPP4 2014-01-07 2015-03-03
3 3 SU 2014-01-09 2014-12-05
Upvotes: 0
Reputation: 16978
You could use
library(lubridate)
library(dplyr)
df %>%
group_by(PATIENT_ID) %>%
mutate(new_end = min(coalesce(lead(start), end) - 1, end)) %>%
slice_head(n = 1) %>%
select(PATIENT_ID, DRUG_CLASS, start, end = new_end) %>%
ungroup()
which returns
# A tibble: 3 x 4
PATIENT_ID DRUG_CLASS start end
<dbl> <chr> <date> <date>
1 1 SU 2014-03-05 2016-07-21
2 2 SU 2014-01-07 2015-03-03
3 3 SU 2014-01-09 2014-12-05
Upvotes: 1
Reputation: 2924
I'd just double check that the arranging of dates works okay
library(dplyr)
library(lubridate)
df %>%
as_tibble() %>%
arrange(PATIENT_ID, start) %>%
group_by(PATIENT_ID) %>%
mutate(
x1 = lead(start, 1L, dmy("01/01/2999"))-1,
end_new = if_else(end > x1, x1, end)
) %>%
select(-x1)
Upvotes: 1