I have a data similar to what we see below. In the actual data, 1 PATIENT_ID may have more than 2 rows. df # PATIENT_ID DRUG_CLASS start end duration # <dbl> <chr> <date> <date> <drtn> # 1 SU 2014-03-05 2016-08-04 883 days # 1 DPP4 2016-07-22 2016-08-26 35 days # 2 SU 2014-01-07 2017-06-23 1263 days # 2 DPP4 2015-03-04 2017-12-27 1029 days # 3 SU 2014-01-09 2014-12-05 330 days # 3 SU 2015-01-08 2016-01-07 364 days What I want to do is to take the first row for every patient, but if the interval overlaps with the next interval, in the next row, then we take the end date to be the start date of the subsequent row - 1 . Example: For 1st patient, since the end date is greater than the start date of the next row, i.e. "2016-08-04" > "2016-07-22" , then the new end date should be "2016-07-21" Similarly, for the 2nd patient, we take the new end date as "2015-03-04" - 1 = "2015-03-03" For the last patient, since there is no overlapping, take the first row as-is Final Data: df # PATIENT_ID DRUG_CLASS start end # <dbl> <chr> <date> <date> # 1 SU 2014-03-05 2016-07-21 # 2 SU 2014-01-07 2015-03-03 # 3 SU 2014-01-09 2014-12-05 Code: df <- structure(list(PATIENT_ID = c(1, 1, 2, 2, 3, 3), DRUG_CLASS = c("SU", "DPP4", "SU", "DPP4", "SU", "SU"), start = structure(c(16134, 17004, 16077, 16498, 16079, 16443), class = "Date"), end = structure(c(17017, 17039, 17340, 17527, 16409, 16807), class = "Date"), duration = structure(c(883, 35, 1263, 1029, 330, 364), class = "difftime", units = "days")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Reputation: 1644

How to align date based on the next row in r?

I have a data similar to what we see below. In the actual data, 1 PATIENT_ID may have more than 2 rows.

df
# PATIENT_ID DRUG_CLASS start      end        duration 
# <dbl>      <chr>      <date>     <date>     <drtn>   
#          1 SU         2014-03-05 2016-08-04  883 days
#          1 DPP4       2016-07-22 2016-08-26   35 days
#          2 SU         2014-01-07 2017-06-23 1263 days
#          2 DPP4       2015-03-04 2017-12-27 1029 days
#          3 SU         2014-01-09 2014-12-05  330 days
#          3 SU         2015-01-08 2016-01-07  364 days

What I want to do is to take the first row for every patient, but if the interval overlaps with the next interval, in the next row, then we take the end date to be the start date of the subsequent row - 1. Example:

For 1st patient, since the end date is greater than the start date of the next row, i.e. "2016-08-04" > "2016-07-22", then the new end date should be "2016-07-21"
Similarly, for the 2nd patient, we take the new end date as "2015-03-04" - 1 = "2015-03-03"
For the last patient, since there is no overlapping, take the first row as-is

Final Data:

df
# PATIENT_ID DRUG_CLASS start      end        
# <dbl>      <chr>      <date>     <date>        
#          1 SU         2014-03-05 2016-07-21  
#          2 SU         2014-01-07 2015-03-03 
#          3 SU         2014-01-09 2014-12-05

Code:

df <- structure(list(PATIENT_ID = c(1, 1, 2, 2, 3, 3), DRUG_CLASS = c("SU", 
"DPP4", "SU", "DPP4", "SU", "SU"), start = structure(c(16134, 
17004, 16077, 16498, 16079, 16443), class = "Date"), end = structure(c(17017, 
17039, 17340, 17527, 16409, 16807), class = "Date"), duration = structure(c(883, 
35, 1263, 1029, 330, 364), class = "difftime", units = "days")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Answers (3)

Roman

Reputation: 17648

Perfect for tidyverse

library(tidyverse)
df %>% 
  group_by(PATIENT_ID) %>% 
  summarise(DRUG_CLASS = toString(unique(DRUG_CLASS)),
            end= if_else(first(end) > last(start) , max(start)-1, min(end)),
            start = first(start)) %>% 
  select(1:2, start, end)
# A tibble: 3 x 4
  PATIENT_ID DRUG_CLASS start      end       
       <dbl> <chr>      <date>     <date>    
1          1 SU, DPP4   2014-03-05 2016-07-21
2          2 SU, DPP4   2014-01-07 2015-03-03
3          3 SU         2014-01-09 2014-12-05

Upvotes: 0

Martin Gal

Reputation: 16978

You could use

library(lubridate)
library(dplyr)

df %>% 
  group_by(PATIENT_ID) %>% 
  mutate(new_end = min(coalesce(lead(start), end) - 1, end)) %>% 
  slice_head(n = 1) %>% 
  select(PATIENT_ID, DRUG_CLASS, start, end = new_end) %>% 
  ungroup()

which returns

# A tibble: 3 x 4
  PATIENT_ID DRUG_CLASS start      end       
       <dbl> <chr>      <date>     <date>    
1          1 SU         2014-03-05 2016-07-21
2          2 SU         2014-01-07 2015-03-03
3          3 SU         2014-01-09 2014-12-05

Upvotes: 1

Quixotic22

Reputation: 2924

I'd just double check that the arranging of dates works okay

library(dplyr)
library(lubridate)
df %>%
  as_tibble() %>%
  arrange(PATIENT_ID, start) %>%
  group_by(PATIENT_ID) %>%
  mutate(
    x1 = lead(start, 1L, dmy("01/01/2999"))-1,
    end_new = if_else(end > x1, x1, end)
  ) %>%
  select(-x1)

Upvotes: 1

How to align date based on the next row in r?

Answers (3)

Related Questions