Comparing consecutive rows and select rows where are subsequent is a specific value

Question

I have a dataframe as follows

structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L), VisitDate = c("13/02/03", "13/04/05", "13/05/12", 
"13/12/06", "13/04/12", "13/05/13", "13/06/14", "13/04/15", "03/04/15", 
"04/05/16", "04/06/16"), EVENT = c("EMR", "RFA", "nothing", "nothing", 
"EMR", "nothing", "EMR", "EMR", "RFA", "EMR", "nothing")), .Names = c("HospNum_Id", 
"VisitDate", "EVENT"), class = "data.frame", row.names = c(NA, 
-11L))

I want to only select the row where the current row EVENT is "EMR" and the one prior to this (arranged by ascending date order) is "nothing" for each HospNum_Id.

My desired output is:

 HospNum_Id VisitDate EVENT
    2   13/12/06    nothing
    2   13/04/12    EMR
    2   13/05/13    nothing
    2   13/06/14    EMR

but my current output is:

  HospNum_Id VisitDate EVENT
       (int)     (chr) (chr)
1          2  13/04/12   EMR
2          2  13/06/14   EMR
3          2  13/04/15   EMR

Currently I have the following code but its letting me down I think because I am using first in the filter rather than a phrase meaning "before the row that has EMR in the EVENT":

Upstaging<-Therap %>% 
  arrange(HospNum_Id, as.Date(Therap$VisitDate, '%d/%m/%y')) %>% 
  group_by(HospNum_Id) %>% 
  filter(first(EVENT == "nothing") & EVENT == "EMR")

akrun · Accepted Answer

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'HospNum_Id', we get the index ('i1') where 'EVENT' is "EMR" and the previous value is "nothing". Use that index to get the previous element index ('i1-1') sort and get the row index (.I). With that, we subset the rows.

library(data.table)
v1 <- setDT(df1)[,  {i1 <- which(EVENT == "EMR" & shift(EVENT)=="nothing")
              .I[sort(c(i1, i1-1))] } , by = HospNum_Id]$V1
df1[v1]
#   HospNum_Id VisitDate   EVENT
#1:          2  13/12/06 nothing
#2:          2  13/04/12     EMR
#3:          2  13/05/13 nothing
#4:          2  13/06/14     EMR

Or with similar methodology from dplyr.

library(dplyr)
df1 %>%
    group_by(HospNum_Id) %>% 
    mutate(ind = EVENT=="nothing" & lead(EVENT)=="EMR") %>% 
    slice(sort(c(which(ind),which(ind)+1))) %>% 
    select(-ind)
#   HospNum_Id VisitDate   EVENT   
#              
#1          2  13/12/06 nothing
#2          2  13/04/12     EMR
#3          2  13/05/13 nothing
#4          2  13/06/14     EMR

Comparing consecutive rows and select rows where are subsequent is a specific value

Answers (2)

Related Questions