Mitchell
Mitchell

Reputation: 237

Using R to subtract subsequent rows for a given value

df

Patient_ID  ADM_DATE    DIS_DATE
278328  4/17/2007   4/19/2007
279347  2/6/2012    2/7/2012
279347  2/28/2012   3/3/2012
287171  1/11/2012   1/14/2012
287171  1/23/2013   2/4/2013
353079  7/12/2011   7/15/2011
608639  10/5/2010   10/7/2010
608639  2/16/2012   2/19/2012
608639  5/2/2012    5/4/2012
608639  11/27/2012  12/4/2012

I need to find the time until the next admission for a given patient. I will need add a new column which subtracts the ADM_DATE from the DIS_DATE(of the previous row) for each Patient_ID.

My final product should look like this

df1

Patient_ID  ADM_DATE    DIS_DATE    Time_to_readmission 
278328  4/17/2007   4/19/2007   NA
279347  2/6/2012    2/7/2012    NA
279347  2/28/2012   3/3/2012    21
287171  1/11/2012   1/14/2012   NA
287171  1/23/2013   2/4/2013    375
353079  7/12/2011   7/15/2011   NA
608639  10/5/2010   10/7/2010   NA
608639  2/16/2012   2/19/2012   497
608639  5/2/2012    5/4/2012    73
608639  11/27/2012  12/4/2012   207

Please help me with the required coding, I am relatively new to R. Thanks in advance.

Upvotes: 1

Views: 299

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

Here's a quick data.table implementation. First, we will convert to proper Date classes, then we will run ADM_DATE - shift(DIS_DATE) by Patient_ID and wrap it up into as.integer (or not) as it seems like you want an integer class rather a difftime.

library(data.table)
setDT(df)[, 2:3 := lapply(.SD, as.IDate, "%m/%d/%Y"), .SDcols = -1]
df[, Diff := as.integer(ADM_DATE - shift(DIS_DATE)), by = Patient_ID]
df
#     Patient_ID   ADM_DATE   DIS_DATE Diff
#  1:     278328 2007-04-17 2007-04-19   NA
#  2:     279347 2012-02-06 2012-02-07   NA
#  3:     279347 2012-02-28 2012-03-03   21
#  4:     287171 2012-01-11 2012-01-14   NA
#  5:     287171 2013-01-23 2013-02-04  375
#  6:     353079 2011-07-12 2011-07-15   NA
#  7:     608639 2010-10-05 2010-10-07   NA
#  8:     608639 2012-02-16 2012-02-19  497
#  9:     608639 2012-05-02 2012-05-04   73
# 10:     608639 2012-11-27 2012-12-04  207

Or the same idea with dplyr

library(dplyr)
df %>%
  mutate_each(funs(as.Date(., "%m/%d/%Y")), -1) %>%
  group_by(Patient_ID) %>%
  mutate(Diff = as.integer(ADM_DATE - lag(DIS_DATE)))

Upvotes: 3

Phann
Phann

Reputation: 1327

I would first format the data.frame

df[ , 2] <- as.Date(df[ , 2], format = "%m/%d/%Y")
df[ , 3] <- as.Date(df[ , 3], format = "%m/%d/%Y")

and then separate it by patient (for every unique patient search for every patient with this id):

dfList <- lapply(unique(df$Patient_ID), function(x) df[which(df$Patient_ID == x), ])

Now dfList is a list with data.frame in each item. Now check every single list item.

dfList2 <- lapply(dfList, function(x){
  if (nrow(x)>1){
    Diff <- c(NA)
    for (i in 2:nrow(x)){
        Diff[i] <- difftime(x[i,2], x[i-1,3])
    }
    cbind(x, Time_to_readmission = Diff)
  } else {
    cbind(x, Time_to_readmission = NA)
  }
})

Now put it alltogether:

do.call("rbind", dfList2)

This is probably not the most elegant way, but I think it works and it is comprehensible.

Upvotes: 0

slamballais
slamballais

Reputation: 3235

Ok, I misunderstood the first time I read it, but this time it should work.

The code:

# Create a column that contains only NAs
df1$Time_to_readmission <- NA

# Make sure that the df is sorted by Patient_ID
df1 <- df1[order(df1$Patient_ID),]

# Basically, find the indices that are the same as the previous row
ind <- which(c(FALSE,diff(df1$Patient_ID) == 0))

# Calculate the difference between DIS_DATE of the previous row and ADM_DATE of the index row
differences <- as.Date(df1$ADM_DATE[ind], format="%m/%d/%Y") - as.Date(df1$DIS_DATE[ind - 1],format="%m/%d/%Y")

# Store those differences in Time_to_readmission
df1$Time_to_readmission[ind] <- differences

Upvotes: 0

Related Questions