Reputation: 237
df
Patient_ID ADM_DATE DIS_DATE
278328 4/17/2007 4/19/2007
279347 2/6/2012 2/7/2012
279347 2/28/2012 3/3/2012
287171 1/11/2012 1/14/2012
287171 1/23/2013 2/4/2013
353079 7/12/2011 7/15/2011
608639 10/5/2010 10/7/2010
608639 2/16/2012 2/19/2012
608639 5/2/2012 5/4/2012
608639 11/27/2012 12/4/2012
I need to find the time until the next admission for a given patient. I will need add a new column which subtracts the ADM_DATE from the DIS_DATE(of the previous row) for each Patient_ID.
My final product should look like this
df1
Patient_ID ADM_DATE DIS_DATE Time_to_readmission
278328 4/17/2007 4/19/2007 NA
279347 2/6/2012 2/7/2012 NA
279347 2/28/2012 3/3/2012 21
287171 1/11/2012 1/14/2012 NA
287171 1/23/2013 2/4/2013 375
353079 7/12/2011 7/15/2011 NA
608639 10/5/2010 10/7/2010 NA
608639 2/16/2012 2/19/2012 497
608639 5/2/2012 5/4/2012 73
608639 11/27/2012 12/4/2012 207
Please help me with the required coding, I am relatively new to R. Thanks in advance.
Upvotes: 1
Views: 299
Reputation: 92282
Here's a quick data.table
implementation. First, we will convert to proper Date
classes, then we will run ADM_DATE - shift(DIS_DATE)
by Patient_ID
and wrap it up into as.integer
(or not) as it seems like you want an integer
class rather a difftime
.
library(data.table)
setDT(df)[, 2:3 := lapply(.SD, as.IDate, "%m/%d/%Y"), .SDcols = -1]
df[, Diff := as.integer(ADM_DATE - shift(DIS_DATE)), by = Patient_ID]
df
# Patient_ID ADM_DATE DIS_DATE Diff
# 1: 278328 2007-04-17 2007-04-19 NA
# 2: 279347 2012-02-06 2012-02-07 NA
# 3: 279347 2012-02-28 2012-03-03 21
# 4: 287171 2012-01-11 2012-01-14 NA
# 5: 287171 2013-01-23 2013-02-04 375
# 6: 353079 2011-07-12 2011-07-15 NA
# 7: 608639 2010-10-05 2010-10-07 NA
# 8: 608639 2012-02-16 2012-02-19 497
# 9: 608639 2012-05-02 2012-05-04 73
# 10: 608639 2012-11-27 2012-12-04 207
Or the same idea with dplyr
library(dplyr)
df %>%
mutate_each(funs(as.Date(., "%m/%d/%Y")), -1) %>%
group_by(Patient_ID) %>%
mutate(Diff = as.integer(ADM_DATE - lag(DIS_DATE)))
Upvotes: 3
Reputation: 1327
I would first format the data.frame
df[ , 2] <- as.Date(df[ , 2], format = "%m/%d/%Y")
df[ , 3] <- as.Date(df[ , 3], format = "%m/%d/%Y")
and then separate it by patient (for every unique patient search for every patient with this id):
dfList <- lapply(unique(df$Patient_ID), function(x) df[which(df$Patient_ID == x), ])
Now dfList is a list with data.frame in each item. Now check every single list item.
dfList2 <- lapply(dfList, function(x){
if (nrow(x)>1){
Diff <- c(NA)
for (i in 2:nrow(x)){
Diff[i] <- difftime(x[i,2], x[i-1,3])
}
cbind(x, Time_to_readmission = Diff)
} else {
cbind(x, Time_to_readmission = NA)
}
})
Now put it alltogether:
do.call("rbind", dfList2)
This is probably not the most elegant way, but I think it works and it is comprehensible.
Upvotes: 0
Reputation: 3235
Ok, I misunderstood the first time I read it, but this time it should work.
The code:
# Create a column that contains only NAs
df1$Time_to_readmission <- NA
# Make sure that the df is sorted by Patient_ID
df1 <- df1[order(df1$Patient_ID),]
# Basically, find the indices that are the same as the previous row
ind <- which(c(FALSE,diff(df1$Patient_ID) == 0))
# Calculate the difference between DIS_DATE of the previous row and ADM_DATE of the index row
differences <- as.Date(df1$ADM_DATE[ind], format="%m/%d/%Y") - as.Date(df1$DIS_DATE[ind - 1],format="%m/%d/%Y")
# Store those differences in Time_to_readmission
df1$Time_to_readmission[ind] <- differences
Upvotes: 0