gobygoul
gobygoul

Reputation: 5

How to create a dummy variable in R for dates that lie between a certain interval?

I have some hospital data that looks like this:

patient_id treatment_1 treatment_2 date_dummy
3 2012-01-04 2012-03-27 0
3 2021-07-11 2012-10-20 0
3 2013-04-04 2013-06-22 0
12 2012-12-09 2013-11-09 0
18 2012-02-25 2012-03-26 0
25 2012-10-06 2013-12-29 1
25 2013-04-06 2013-07-07 0

I need to re-create the date_dummy variable that equals 1 if the patient was treated again between the two treatment dates, and 0 otherwise. Patient 25 is the best example of this.

If anyone knows a command to do this using the dplyr package in R that would awesome. Thanks for any help.

Upvotes: 0

Views: 1302

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388862

Building upon @Rex Parsons answer you can do :

library(dplyr)
library(lubridate)
library(purrr)

df %>%
  mutate(across(starts_with('treatment'), as.Date), 
         interval = interval(treatment_1, treatment_2)) %>%
  group_by(patient_id) %>%
  mutate(date_dummy = map_int(row_number(), 
                       ~as.integer(any(interval[-.x] %within% interval[.x])))) %>%
  ungroup

#  patient_id treatment_1 treatment_2 date_dummy interval                      
#       <int> <date>      <date>           <int> <Interval>                    
#1          3 2012-01-04  2012-03-27           0 2012-01-04 UTC--2012-03-27 UTC
#2          3 2012-07-11  2012-10-20           0 2012-07-11 UTC--2012-10-20 UTC
#3          3 2013-04-04  2013-06-22           0 2013-04-04 UTC--2013-06-22 UTC
#4         12 2012-12-09  2013-11-09           0 2012-12-09 UTC--2013-11-09 UTC
#5         18 2012-02-25  2012-03-26           0 2012-02-25 UTC--2012-03-26 UTC
#6         25 2012-10-06  2013-12-29           1 2012-10-06 UTC--2013-12-29 UTC
#7         25 2013-04-06  2013-07-07           0 2013-04-06 UTC--2013-07-07 UTC

You may want to remove interval column from the final output if you don't need it.

Upvotes: 0

Rex Parsons
Rex Parsons

Reputation: 339

to check whether a date is within the range of two other dates, you can use:

library(lubridate)
x %within% interval(ymd(20161001), ymd(20170930))

This checks whether x is between October 1st 2016 and Sep 30th, 2017.

I'm not sure what your date for 'treated again' within the two treatment dates is called but something like this may work:

data %>%
    mutate(date_dummy = ifelse(treated_again_date %within% interval(treatment_1, treatment_2), 1, 0)

Upvotes: 2

Related Questions