How to find first occurence within a group when grouping by?

Question

Here is an example of the input after being sorted on the date, the number of dates by id is unkown and gaps are possibe between dates:

input <-  tribble(
  
         ~id, ~date, ~outcome,
         
           1, "2000/01/01", FALSE,
           1, "2000/01/02", FALSE,
           1, "2000/01/03", TRUE,
           1, "2000/01/04", FALSE,
         
           2, "2000/01/01", TRUE,
           2, "2000/01/02", FALSE,
           2, "2000/01/03", TRUE,
           2, "2000/01/04", FALSE,
         
           3, "2000/01/01", FALSE,
           3, "2000/01/02", FALSE,
           3, "2000/01/03", FALSE,
           3, "2000/01/04", TRUE
         )

I want to keep all the lines once the outcome has been true. Here is the desired output:

output <-  tribble(
  
         ~id, ~date, ~outcome,
         
           1, "2000/01/03", TRUE,
           1, "2000/01/04", FALSE,
         
           2, "2000/01/01", TRUE,
           2, "2000/01/02", FALSE,
           2, "2000/01/03", TRUE,
           2, "2000/01/04", FALSE,
         
           3, "2000/01/04", TRUE
         )

I have tried tidyverse constructs with group_by() but with no success:

input %>%
  group_by(id) %>%
  ???

akrun · Accepted Answer

After grouping by 'id', filter by doing the cumsum on the logical column (TRUE -> 1 and FALSE -> 0) so that with cumsum at the first TRUE value, it changes value to 1 and continues until it hit another TRUE value, thus if we do > 0, it will only return rows from the first occurence of TRUE

library(dplyr)
input %>% 
    group_by(id) %>% 
    filter(cumsum(outcome) > 0) %>%
    ungroup

-ouptut

# A tibble: 7 x 3
     id date       outcome
           
1     1 2000/01/03 TRUE   
2     1 2000/01/04 FALSE  
3     2 2000/01/01 TRUE   
4     2 2000/01/02 FALSE  
5     2 2000/01/03 TRUE   
6     2 2000/01/04 FALSE  
7     3 2000/01/04 TRUE

Another option is also to match which returns the index of the first TRUE value and get the sequence after that

input %>%
     group_by(id) %>%
     filter(row_number() >= match(TRUE, outcome))

data

 input <-  tribble(
  
         ~id, ~date, ~outcome,
         
           1, "2000/01/01", FALSE,
           1, "2000/01/02", FALSE,
           1, "2000/01/03", TRUE,
           1, "2000/01/04", FALSE,
         
           2, "2000/01/01", TRUE,
           2, "2000/01/02", FALSE,
           2, "2000/01/03", TRUE,
           2, "2000/01/04", FALSE,
         
           3, "2000/01/01", FALSE,
           3, "2000/01/02", FALSE,
           3, "2000/01/03", FALSE,
           3, "2000/01/04", TRUE
         )

How to find first occurence within a group when grouping by?

Answers (2)

data

Related Questions