Mitchell
Mitchell

Reputation: 237

R: subtracting subsequent rows for a select amount of values

This is the data I currently have:

df

patient ID  Index_admission?    adm_date    dish_date
1244             FALSE           2/7/2009   2/8/2009
1244             TRUE            3/5/2009   3/15/2009
1244             FALSE           4/5/2011   4/7/2011
1244             FALSE           3/25/2012  3/27/2012
1244             TRUE            5/5/2012   5/20/2012
1244             TRUE            9/8/2013   9/15/2013
1244             FALSE           1/5/2014   1/15/2014
2333             FALSE           1/1/2010   1/8/2010
2333             FALSE           1/1/2011   1/5/2011
2333             TRUE            2/2/2011   2/25/2011
2333             FALSE           1/25/2012  1/28/2012   
5422             TRUE            3/5/2015   3/15/2015   
1243             TRUE            2/5/2009   2/8/2009
1243             TRUE            2/5/2011   2/19/2011

I need to find the time_to_readmission from the previous Index_admission. I will need to add a new column which subtracts the adm_date from the correct dish_date. This should only be done if the patient has already has had a TRUE for Index_admission.

ALSO the time_to_readmission should always be calculated to the nearest Index_admission date if the patient has multiple Index_admission.

Probably easier to explain though looking at how I want the data to look:

df1

patient ID  Index_admission?    adm_date    dish_date   time_to_readmission
1244             FALSE           2/7/2009   2/8/2009    NA
1244             TRUE            3/5/2009   3/15/2009   NA
1244             FALSE           4/5/2011   4/7/2011    751
1244             FALSE           3/25/2012  3/27/2012   1106
1244             TRUE            5/5/2012   5/20/2012   1147
1244             TRUE            9/8/2013   9/15/2013   476
1244             FALSE           1/5/2014   1/15/2014   112
2333             FALSE           1/1/2010   1/8/2010    NA
2333             FALSE           1/1/2011   1/5/2011    NA
2333             TRUE            2/2/2011   2/25/2011   NA
2333             FALSE           1/25/2012  1/28/2012   334
5422             TRUE            3/5/2015   3/15/2015   NA
1243             TRUE            2/5/2009   2/8/2009    NA
1243             TRUE            2/5/2011   2/19/2011   727

Please help me with the required coding. Thanks in advance.

> dput(df)
structure(list(patient.ID = c(124L, 124L, 124L, 124L, 124L, 124L, 
124L, 233L, 233L, 233L, 233L, 542L, 1243L, 1243L), Index.admission. = c(FALSE, 
TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, 
TRUE, TRUE, TRUE), adm_date = structure(c(8L, 10L, 12L, 9L, 13L, 
14L, 4L, 1L, 2L, 5L, 3L, 11L, 6L, 7L), .Label = c("1/1/2010", 
"1/1/2011", "1/25/2012", "1/5/2014", "2/2/2011", "2/5/2009", 
"2/5/2011", "2/7/2009", "3/25/2012", "3/5/2009", "3/5/2015", 
"4/5/2011", "5/5/2012", "9/8/2013"), class = "factor"), dish_date = structure(c(7L, 
8L, 11L, 10L, 12L, 13L, 1L, 4L, 3L, 6L, 2L, 9L, 7L, 5L), .Label = c("1/15/2014", 
"1/28/2012", "1/5/2011", "1/8/2010", "2/19/2011", "2/25/2011", 
"2/8/2009", "3/15/2009", "3/15/2015", "3/27/2012", "4/7/2011", 
"5/20/2012", "9/15/2013"), class = "factor")), .Names = c("patient.ID", 
"Index.admission.", "adm_date", "dish_date"), class = "data.frame", row.names = c(NA, 
-14L))

Upvotes: 3

Views: 111

Answers (1)

alexwhitworth
alexwhitworth

Reputation: 4907

This should work. Note that i get a data.table type-error when I run it, but the answer is correct.

One caveat here is that this calculates the time to readmit from the first dish_date meeting your criteria, which is what you request in the post "subtracts the adm_date from the dish_date (of a previous row) ". You don't specify which previous row... I'm taking the first dish_date meeting your criteria.

From you example output, that's not exactly what you're doing. Instead, it appears you have some unclear criteria on how to choose the "of a previous row." It's not clear what this rule is. Clarify the question if you want a different output

calc_readmit <- function(df) {
  if (nrow(df) == 1) return(NA)
  admitted <- c(0,cumsum(df$Index_admission))
  admitted <- admitted[-length(admitted)]
  dt1 <- df$dish_date[min(which(admitted > 0))-1]
  admit2 <- ifelse(admitted > 0, dt1, NA)
  time <- as.integer(df$adm_date) - admit2
  as.integer(ifelse(admitted > 0, time, NA))
}

library(data.table)
df <- data.table(df, key= "id")
df <- df[, time_to_readmission := calc_readmit(.SD), by= "id"]

R> df
      id Index_admission.   adm_date  dish_date time_to_readmission
 1: 1243             TRUE 2009-02-05 2009-02-08                  NA
 2: 1243             TRUE 2011-02-05 2011-02-19                 727
 3: 1244            FALSE 2009-02-07 2009-02-08                  NA
 4: 1244             TRUE 2009-03-05 2009-03-15                  NA
 5: 1244            FALSE 2011-04-05 2011-04-07                 751
 6: 1244            FALSE 2012-03-25 2012-03-27                1106
 7: 1244             TRUE 2012-05-05 2012-05-20                1147
 8: 1244             TRUE 2013-09-08 2013-09-15                1638
 9: 1244            FALSE 2014-01-05 2014-01-15                1757
10: 2333            FALSE 2010-01-01 2010-01-08                  NA
11: 2333            FALSE 2011-01-01 2011-01-05                  NA
12: 2333             TRUE 2011-02-02 2011-02-25                  NA
13: 2333            FALSE 2012-01-25 2012-01-28                 334
14: 5422             TRUE 2015-03-05 2015-03-15                  NA

Upvotes: 1

Related Questions