Sebastian Zeki
Sebastian Zeki

Reputation: 6874

Comparing consecutive rows and select rows where are subsequent is a specific value

I have a dataframe as follows

structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L), VisitDate = c("13/02/03", "13/04/05", "13/05/12", 
"13/12/06", "13/04/12", "13/05/13", "13/06/14", "13/04/15", "03/04/15", 
"04/05/16", "04/06/16"), EVENT = c("EMR", "RFA", "nothing", "nothing", 
"EMR", "nothing", "EMR", "EMR", "RFA", "EMR", "nothing")), .Names = c("HospNum_Id", 
"VisitDate", "EVENT"), class = "data.frame", row.names = c(NA, 
-11L))

I want to only select the row where the current row EVENT is "EMR" and the one prior to this (arranged by ascending date order) is "nothing" for each HospNum_Id.

My desired output is:

 HospNum_Id VisitDate EVENT
    2   13/12/06    nothing
    2   13/04/12    EMR
    2   13/05/13    nothing
    2   13/06/14    EMR

but my current output is:

  HospNum_Id VisitDate EVENT
       (int)     (chr) (chr)
1          2  13/04/12   EMR
2          2  13/06/14   EMR
3          2  13/04/15   EMR

Currently I have the following code but its letting me down I think because I am using first in the filter rather than a phrase meaning "before the row that has EMR in the EVENT":

Upstaging<-Therap %>% 
  arrange(HospNum_Id, as.Date(Therap$VisitDate, '%d/%m/%y')) %>% 
  group_by(HospNum_Id) %>% 
  filter(first(EVENT == "nothing") & EVENT == "EMR")

Upvotes: 1

Views: 497

Answers (2)

S.De
S.De

Reputation: 106

just using elementary operation the desired result can be obtained.

Step 1. Load the data(

Step 2. Arrange the data frame by ascending date order

Step 3. Select the rows having event="EMR" and create a data frame and create a data frame containing the previous rows.

Step 4. Remove duplicate and sort according to date

a<-loaded dataframe
a[order(as.Date(a$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf <- a[rev(rownames(a)),]
b<- revdf[which(revdf$EVENT=="EMR" ),] 
c<- revdf[which(revdf$EVENT=="EMR" )-1,]
d<-rbind(b,c)
e<-d[!duplicated(d),] 
f<-e[order(as.Date(e$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf1<-f[rev(rownames(f)),]

output:

  >revdf1
        HospNum_Id  VisitDate  EVENT
   11          3  04/06/16 nothing
   10          3  04/05/16     EMR
   8           2  13/04/15     EMR
   9           3  03/04/15     RFA
   7           2  13/06/14     EMR
   3           1  13/05/12 nothing
   5           2  13/04/12     EMR
   2           1  13/04/05     RFA
   1           1  13/02/03     EMR

Upvotes: 0

akrun
akrun

Reputation: 887118

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'HospNum_Id', we get the index ('i1') where 'EVENT' is "EMR" and the previous value is "nothing". Use that index to get the previous element index ('i1-1') sort and get the row index (.I). With that, we subset the rows.

library(data.table)
v1 <- setDT(df1)[,  {i1 <- which(EVENT == "EMR" & shift(EVENT)=="nothing")
              .I[sort(c(i1, i1-1))] } , by = HospNum_Id]$V1
df1[v1]
#   HospNum_Id VisitDate   EVENT
#1:          2  13/12/06 nothing
#2:          2  13/04/12     EMR
#3:          2  13/05/13 nothing
#4:          2  13/06/14     EMR

Or with similar methodology from dplyr.

library(dplyr)
df1 %>%
    group_by(HospNum_Id) %>% 
    mutate(ind = EVENT=="nothing" & lead(EVENT)=="EMR") %>% 
    slice(sort(c(which(ind),which(ind)+1))) %>% 
    select(-ind)
#   HospNum_Id VisitDate   EVENT   
#      <int>     <chr>   <chr>
#1          2  13/12/06 nothing
#2          2  13/04/12     EMR
#3          2  13/05/13 nothing
#4          2  13/06/14     EMR

Upvotes: 1

Related Questions