Reputation: 3
I have the following data frame and I am looking to create a variable that calculates the amount of days between the current ADMITDATE
from the previous DISCHARGE DATE
only for duplicates or when the ID
s are equal
ID ADMITDATE DISCHARGE DATE
1 5/24/2020 5/27/2020
1 5/30/2020 6/15/2020
2 3/9/2010 4/5/2020
3 2/7/2020 3/7/2020
4 5/15/2020 5/16/2020
4 5/20/2020 5/25/2020
4 5/30/2020 6/15/2020
5 7/8/2020 7/20/2020
6 1/20/2020 1/22/2020
6 1/28/2020 2/5/2020
So I want the output to look like below.
ID ADMITDATE DISCHARGE DATE DAYDIFF
1 5/24/2020 5/27/2020 NULL
1 5/30/2020 6/15/2020 3
2 3/9/2010 4/5/2020 NULL
3 2/7/2020 3/7/2020 NULL
4 5/15/2020 5/16/2020 NULL
4 5/20/2020 5/25/2020 4
4 5/30/2020 6/15/2020 5
5 7/8/2020 7/20/2020 NULL
6 1/20/2020 1/22/2020 NULL
6 1/28/2020 2/5/2020 6
Any tips on how to write the code for this in R?
The code i currently have is
df$DAYDIFF <- [df$ID==lag(ID)] <- as.Date(as.character(test$ADMITDATE), format="%m/%d/%Y")-
lag(as.Date(as.character(test$DISCHARGEDATE), format="%m/%d/%Y"))
Upvotes: 0
Views: 160
Reputation: 767
Assuming your data is stored in a dataframe called df
:
library(tidyverse)
library(lubridate)
df %>%
mutate(ADMITDATE = mdy(ADMITDATE),
DISCHARGEDATE = mdy(DISCHARGEDATE)) %>%
group_by(ID) %>%
mutate(DAYDIFF = ADMITDATE - lag(DISCHARGEDATE, 1))
The mutate
call converts your columns to date formats using lubridate to specify the format as month-day-year. We then group_by
the ID
values, before calculating the difference between the current ADMITDATE
and the previous DISCHARGEDATE
(using lag()
)
Upvotes: 1