Identify gaps in a continuous time period

Question

I have a dataframe with some observations of when lines attached to IDs. I need the period of time in days when each ID had a line/catheter attached.

Here is my dput return:

structure(list(ID = c(487622L, 487622L, 487639L, 487639L, 489027L, 
489027L, 489027L, 491858L, 491858L, 491858L, 491858L, 491858L, 
491858L), Line = c("Central Venous Line", "Central Venous Line", 
"Central Venous Line", "Peripherally Inserted Central Catheter (PICC)", 
"Haemodialysis Catheter", "Peripherally Inserted Central Catheter (PICC)", 
"Haemodialysis Catheter", "Central Venous Line", "Haemodialysis Catheter", 
"Central Venous Line", "Haemodialysis Catheter", "Central Venous Line", 
"Peripherally Inserted Central Catheter (PICC)"), Start = structure(c(1362528000, 
1363219200, 1362268800, 1363219200, 1364774400, 1365120000, 1365465600, 
1364688000, 1364688000, 1365724800, 1365724800, 1366848000, 1369353600
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), End = structure(c(1362787200, 
1363824000, 1363305600, 1363737600, 1365465600, 1366675200, 1365638400, 
1365724800, 1365724800, 1366329600, 1366848000, 1367539200, 1369612800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Days = c("3.095138889", 
"7.045138889", "11.87777778", "5.736111111", "7.850694444", "18.02083333", 
"1.813888889", "12.32986111", "12.71388889", "6.782638889", "13.14027778", 
"7.718055556", "3.397222222"), dateOrder = c(1L, 2L, 1L, 2L, 
1L, 2L, 3L, 1L, 2L, 3L, 4L, 5L, 6L)), .Names = c("ID", "Line", 
"Start", "End", "Days", "dateOrder"), row.names = 79:91, class = "data.frame")

Here is the catch. It does not matter if an ID has more than one line/catheter. I just need to take the earliest start date for each ID, the latest end date for each ID, and calculate the number of continuous days each ID has a line/catheter attached.

The problem is confounded by some cases, e.g. ID 491858. This individual had a line removed (dateOrder = 5) on 2013-05-03 and reinserted on 2013-05-24 for just over 3 days.

How I intended to handle this is to subtract the gap (number of days) from the number of days of continuous time between min(Start Date) and max(end date).

There are over 20,000 records in the data set.

Here is what I have done so far:

Converted the DF to a list of DFs based on ID. I intended to apply a function to each DF something as follows:

If the difference in time (days) between subsequent start date and previous end date for each row exceeds 0, then add TRUE or some arbitrary column value to each data frame.

function(y){
    for (i in length(y)){
        if(difftime(y$Start[i+1], y$End[i], units='days') > 0){

            y$test <- TRUE}
        }
    }

Any help would be greatly appreciated.

Thanks.

UPDATE

Ignore the days column. It is of no use. I intend to aggregate month line counts from the unique cases.

alexis_laz · Accepted Answer

I guess something like this might help, unless I've misunderstood something:

unlist(lapply(split(DF, DF$ID), 
  function(x) { totaldays <- max(x$End) - min(x$Start);
   x$Start <- c(x$Start[-1], NA);
   res <- difftime(x$Start[-length(x$Start)], x$End[-length(x$Start)], units = "days");
   res <- res[res > 0];
   res <- ifelse(length(res) == 0, 0, res);
   return(as.numeric(totaldays - res)) }))
#487622 487639 489027 491858 
#    10     17     22     36

DF is your dput.

Identify gaps in a continuous time period

Answers (2)

Related Questions