Reputation: 123
Below is an example of a table I am working with.
df = data.frame(Test_ID = c('a1','a1','a1','a1','a1','a1','a1','a2','a2','a2','a2','a2','a2'),
Event_ID = c('Failure_x', 'Failure_x', 'Failure_y', 'Failure_y', 'Failure_x',
'Failure_x', 'Failure_y', 'Failure_x', 'Failure_y', 'Failure_y',
'Failure_x','Failure_x', 'Failure_y'),
Fail_Date = c('2018-10-10 17:52:20', '2018-10-11 17:02:16', '2018-10-14 12:52:20',
'2018-11-11 16:18:34', '2018-11-12 17:03:06', '2018-11-25 10:50:10',
'2018-12-01 10:28:50', '2018-09-12 19:02:08', '2018-09-20 11:32:25',
'2018-10-13 14:43:30', '2018-10-15 14:22:28', '2018-10-30 21:55:45',
'2018-11-17 11:53:35'))
I want to subtract the failure dates (by Test_ID) only where Failure_y occurs after Failure_x. The Fail_Date for Event_ID Failure_y will be subtracted from the Fail_Date for Event_ID Failure_x. Within a group I can have multiple Failure_y's. The second Failure_y will be subtracted from the Failure_x occurring after the first instance of Failure_y.
I have tried to use dplyr to create a column TIME_BETWEEN_FAILURES.
library(lubridate)
df$Fail_Date = as.POSIXct(as.character(as.factor(df$Fail_Date)),format="%Y-%m-%d %H:%M:%S")
df = df %>% group_by(Test_ID) %>%
mutate(TIME_BETWEEN_FAILURES = ifelse(Event_ID == "Failure_y" & lag(Event_ID) == "Failure_x",
difftime(Fail_Date, first(Fail_Date),units = "hours"),''))`
I was able to create the Time_BETWEEN_FAILURES correctly only for the first instance using first() in dplyr. That's where I am currently stuck. Any help on this matter will be appreciated.
This is result from the code snippet above.
Output required for analysis.
This is ideal response needed for my analysis.
Thanks. Cheers.
Upvotes: 2
Views: 313
Reputation: 7373
df %>%
group_by(gr = rev(cumsum(rev(Event_ID)=="Failure_y")), Test_ID) %>%
mutate(time_between_failures = ifelse(n() > 1 & Event_ID=="Failure_y", difftime(Fail_Date[n()], Fail_Date[1L], units = "hours"), NA))
# A tibble: 13 x 5
# Groups: gr, Test_ID [6]
Test_ID Event_ID Fail_Date gr time_between_failures
<fct> <fct> <dttm> <int> <dbl>
1 a1 Failure_x 2018-10-10 17:52:20 6 NA
2 a1 Failure_x 2018-10-11 17:02:16 6 NA
3 a1 Failure_y 2018-10-14 12:52:20 6 91
4 a1 Failure_y 2018-11-11 16:18:34 5 NA
5 a1 Failure_x 2018-11-12 17:03:06 4 NA
6 a1 Failure_x 2018-11-25 10:50:10 4 NA
7 a1 Failure_y 2018-12-01 10:28:50 4 449.
8 a2 Failure_x 2018-09-12 19:02:08 3 NA
9 a2 Failure_y 2018-09-20 11:32:25 3 185.
10 a2 Failure_y 2018-10-13 14:43:30 2 NA
11 a2 Failure_x 2018-10-15 14:22:28 1 NA
12 a2 Failure_x 2018-10-30 21:55:45 1 NA
13 a2 Failure_y 2018-11-17 11:53:35 1 790.
Upvotes: 1