yeshead
yeshead

Reputation: 45

Time since last event of grouped data in R

I have a data frame that contains a grouping variable (ID), a date and an event column with numeric values, in which 0 represent no event and >0 represents an event. An example data frame can be generated with the following code:

df <- data.frame(ID = c (1, 1, 1, 1, 2, 2, 2),
                   date = as.Date(c("2014-08-03", "2014-08-04", "2014-08-07", "2014-08-10", "2015-07-01", "2015-07-03", "2015-08-01")),
                   event = c(1, 0, 3, 0, 0, 4, 0))
df
> df
  ID       date event
1  1 2014-08-03     1
2  1 2014-08-04     0
3  1 2014-08-07     3
4  1 2014-08-10     0
5  2 2015-07-01     0
6  2 2015-07-03     4
7  2 2015-08-01     0

Now, I want to calculate the time that has passed since any last event (>0) has occured. In the particular case that the first entry/entries for any ID contains no event, "NA" should be generated. My desired output would look like this:

> df
  ID       date event tae
1  1 2014-08-03     1   0
2  1 2014-08-04     0   1
3  1 2014-08-07     3   0
4  1 2014-08-10     0   3
5  2 2015-07-01     0  NA
6  2 2015-07-03     4   0
7  2 2015-08-01     0  29

I have tried several different approaches. The closest I got was this:

library(dplyr)

df %>%
  mutate(tmpG = cumsum(c(FALSE, as.logical(diff(event))))) %>%
  group_by(ID) %>%
  mutate(tmp = c(0, diff(date)) * !event) %>%
  group_by(tmpG) %>%
  mutate(tae = cumsum(tmp)) %>%
  ungroup() %>%
  select(-c(tmp, tmpG))

# A tibble: 7 x 4
     ID date       event   tae
  <dbl> <date>     <dbl> <dbl>
1     1 2014-08-03     1     0
2     1 2014-08-04     0     1
3     1 2014-08-07     3     0
4     1 2014-08-10     0     3
5     2 2015-07-01     0     3
6     2 2015-07-03     4     0
7     2 2015-08-01     0    29

Any suggestions on how to get that code running (or any other alternative) would be greatly appreciated.

Upvotes: 4

Views: 473

Answers (3)

NicChr
NicChr

Reputation: 1253

An approach that uses only dplyr.

df %>%
  mutate(event_id = cumsum(event),
         .by = ID) %>%
  mutate(tae = if_else(event_id == 0L, NA_real_, 
                       as.numeric(date - first(date))), 
         .by = c(ID, event_id))
  ID       date event event_id tae
1  1 2014-08-03     1        1   0
2  1 2014-08-04     0        1   1
3  1 2014-08-07     3        4   0
4  1 2014-08-10     0        4   3
5  2 2015-07-01     0        0  NA
6  2 2015-07-03     4        4   0
7  2 2015-08-01     0        4  29

Upvotes: 0

Ben
Ben

Reputation: 30474

Here is another tidyverse approach, that uses fill to carry forward the most recent event.

library(tidyverse)

df %>%
  group_by(ID) %>%
  mutate(last_event = if_else(event > 0, date, NA_real_)) %>%
  fill(last_event) %>%
  mutate(tae = as.numeric(date - last_event))

Output

     ID date       event last_event   tae
  <dbl> <date>     <dbl> <date>     <dbl>
1     1 2014-08-03     1 2014-08-03     0
2     1 2014-08-04     0 2014-08-03     1
3     1 2014-08-07     3 2014-08-07     0
4     1 2014-08-10     0 2014-08-07     3
5     2 2015-07-01     0 NA            NA
6     2 2015-07-03     4 2015-07-03     0
7     2 2015-08-01     0 2015-07-03    29

Upvotes: 3

langtang
langtang

Reputation: 24722

df %>%
  group_by(ID) %>%
  mutate(tae = as.double(if_else(event==0, date-lag(date), 0)))

Output:

     ID date       event   tae
  <dbl> <date>     <dbl> <dbl>
1     1 2014-08-03     1     0
2     1 2014-08-04     0     1
3     1 2014-08-07     3     0
4     1 2014-08-10     0     3
5     2 2015-07-01     0    NA
6     2 2015-07-03     4     0
7     2 2015-08-01     0    29

Upvotes: 0

Related Questions