Reputation: 6874
I have a dataframe as follows
structure(list(HospNum_Id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L), VisitDate = c("13/02/03", "13/04/05", "13/05/12",
"13/12/06", "13/04/12", "13/05/13", "13/06/14", "13/04/15", "03/04/15",
"04/05/16", "04/06/16"), EVENT = c("EMR", "RFA", "nothing", "nothing",
"EMR", "nothing", "EMR", "EMR", "RFA", "EMR", "nothing")), .Names = c("HospNum_Id",
"VisitDate", "EVENT"), class = "data.frame", row.names = c(NA,
-11L))
I want to only select the row where the current row EVENT
is "EMR"
and the one prior to this (arranged by ascending date order) is "nothing" for each HospNum_Id
.
My desired output is:
HospNum_Id VisitDate EVENT
2 13/12/06 nothing
2 13/04/12 EMR
2 13/05/13 nothing
2 13/06/14 EMR
but my current output is:
HospNum_Id VisitDate EVENT
(int) (chr) (chr)
1 2 13/04/12 EMR
2 2 13/06/14 EMR
3 2 13/04/15 EMR
Currently I have the following code but its letting me down I think because I am using first in the filter rather than a phrase meaning "before the row that has EMR in the EVENT":
Upstaging<-Therap %>%
arrange(HospNum_Id, as.Date(Therap$VisitDate, '%d/%m/%y')) %>%
group_by(HospNum_Id) %>%
filter(first(EVENT == "nothing") & EVENT == "EMR")
Upvotes: 1
Views: 497
Reputation: 106
just using elementary operation the desired result can be obtained.
Step 1. Load the data(
Step 2. Arrange the data frame by ascending date order
Step 3. Select the rows having event="EMR" and create a data frame and create a data frame containing the previous rows.
Step 4. Remove duplicate and sort according to date
a<-loaded dataframe
a[order(as.Date(a$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf <- a[rev(rownames(a)),]
b<- revdf[which(revdf$EVENT=="EMR" ),]
c<- revdf[which(revdf$EVENT=="EMR" )-1,]
d<-rbind(b,c)
e<-d[!duplicated(d),]
f<-e[order(as.Date(e$VisitDate,format="%d/%m/%Y")),,drop=FALSE]
revdf1<-f[rev(rownames(f)),]
output:
>revdf1
HospNum_Id VisitDate EVENT
11 3 04/06/16 nothing
10 3 04/05/16 EMR
8 2 13/04/15 EMR
9 3 03/04/15 RFA
7 2 13/06/14 EMR
3 1 13/05/12 nothing
5 2 13/04/12 EMR
2 1 13/04/05 RFA
1 1 13/02/03 EMR
Upvotes: 0
Reputation: 887118
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'HospNum_Id', we get the index ('i1') where 'EVENT' is "EMR" and the previous value is "nothing". Use that index to get the previous element index ('i1-1') sort
and get the row index (.I
). With that, we subset the rows.
library(data.table)
v1 <- setDT(df1)[, {i1 <- which(EVENT == "EMR" & shift(EVENT)=="nothing")
.I[sort(c(i1, i1-1))] } , by = HospNum_Id]$V1
df1[v1]
# HospNum_Id VisitDate EVENT
#1: 2 13/12/06 nothing
#2: 2 13/04/12 EMR
#3: 2 13/05/13 nothing
#4: 2 13/06/14 EMR
Or with similar methodology from dplyr
.
library(dplyr)
df1 %>%
group_by(HospNum_Id) %>%
mutate(ind = EVENT=="nothing" & lead(EVENT)=="EMR") %>%
slice(sort(c(which(ind),which(ind)+1))) %>%
select(-ind)
# HospNum_Id VisitDate EVENT
# <int> <chr> <chr>
#1 2 13/12/06 nothing
#2 2 13/04/12 EMR
#3 2 13/05/13 nothing
#4 2 13/06/14 EMR
Upvotes: 1