Seydou GORO
Seydou GORO

Reputation: 1285

How to select all rows between two events

Here is a representation of my dataset

    set.seed(1)
library (tidyverse)

Date<-c(1:6,1:8,1:10)
ID<-c(rep(1,3*2),rep(2,4*2),rep(3,5*2))
Surgery<-c(c("Surg1",NA,NA,NA,"Surg2",NA),
        c(NA,NA,NA,"Surg.a",NA,NA,"Surg.f",NA),
        c("Surg.C",NA,NA,"Surg.A",NA,NA,"Surg.X",NA,NA,NA))
Complication<-sample(c(rep("Infection",8),rep("Pain",7),rep("bleeding",5),rep("Oedema",4)))
NumberOfSurgery<-c(rep(2,6),rep(2,8),rep(3,10))
OrderOfSurgery<-c(1,rep(NA,3),2,rep(NA,4),1,NA,NA,2,NA,1,NA,NA,2,NA,NA,3,rep(NA,3))

mydata<-data.frame(ID,Date,Surgery,Complication,NumberOfSurgery,OrderOfSurgery)
mydata

There are three patients. The first one has benefited from two surgeries over time, the second one 2 also and the last one three surgeries.

I would like to select all complications from the first day of surgery 1 to the first day of surgery 2, and for all individuals; in order to have such a dataset below

   ID Date Surgery Complication NumberOfSurgery OrderOfSurgery
1   1    1   Surg1    Infection               2              1
2   1    2    <NA>    Infection               2             NA
3   1    3    <NA>    Infection               2             NA
4   1    4    <NA>    Infection               2             NA
5   1    5   Surg2         Pain               2              2
10  2    4  Surg.a     bleeding               2              1
11  2    5    <NA>         Pain               2             NA
12  2    6    <NA>    Infection               2             NA
13  2    7  Surg.f     bleeding               2              2
15  3    1  Surg.C         Pain               3              1
16  3    2    <NA>         Pain               3             NA
17  3    3    <NA>         Pain               3             NA
18  3    4  Surg.A     bleeding               3              2

Here is how I proceeded: The dates are already arranged

Firstly, I suppressed for the second individuals all the observation that occurred before the first surgery (that is from the date 1 to the date 3).

mydata2<-mydata%>%mutate(Surgery2=Surgery,OrderOfSurgery2=OrderOfSurgery)%>%
  group_by(ID)%>%fill(Surgery2,OrderOfSurgery2)%>%filter(!is.na(Surgery2))

Then I could select only all observations that follow the first surgery, by doing this:

mydata3<-mydata2%>%filter(OrderOfSurgery2==1)

What I want to do is to include the row at the date of the second surgery,as I mentionned above.

Upvotes: 0

Views: 143

Answers (1)

Zaw
Zaw

Reputation: 1474

Using lag() and shift OrderOfSurgery2 down by one position, filling the blank with the default value of 1 should do.

mydata3 <- mydata2 %>%
  mutate(OrderOfSurgery2 = lag(OrderOfSurgery2, default = 1)) %>%
  filter(OrderOfSurgery2 == 1)

# # A tibble: 13 x 8
# # Groups:   ID [3]
#       ID  Date Surgery Complication NumberOfSurgery OrderOfSurgery Surgery2
#    <dbl> <int> <chr>   <chr>                  <dbl>          <dbl> <chr>   
#  1     1     1 Surg1   Infection                  2              1 Surg1   
#  2     1     2 NA      Infection                  2             NA Surg1   
#  3     1     3 NA      Infection                  2             NA Surg1   
#  4     1     4 NA      Infection                  2             NA Surg1    
#  5     1     5 Surg2   Pain                       2              2 Surg2    
#  6     2     4 Surg.a  bleeding                   2              1 Surg.a   
#  7     2     5 NA      Pain                       2             NA Surg.a   
#  8     2     6 NA      Infection                  2             NA Surg.a   
#  9     2     7 Surg.f  bleeding                   2              2 Surg.f  
# <Omitted>

Upvotes: 1

Related Questions