pietrodito
pietrodito

Reputation: 2090

How to find first occurence within a group when grouping by?

Here is an example of the input after being sorted on the date, the number of dates by id is unkown and gaps are possibe between dates:

input <-  tribble(
  
         ~id, ~date, ~outcome,
         
           1, "2000/01/01", FALSE,
           1, "2000/01/02", FALSE,
           1, "2000/01/03", TRUE,
           1, "2000/01/04", FALSE,
         
           2, "2000/01/01", TRUE,
           2, "2000/01/02", FALSE,
           2, "2000/01/03", TRUE,
           2, "2000/01/04", FALSE,
         
           3, "2000/01/01", FALSE,
           3, "2000/01/02", FALSE,
           3, "2000/01/03", FALSE,
           3, "2000/01/04", TRUE
         )

I want to keep all the lines once the outcome has been true. Here is the desired output:

output <-  tribble(
  
         ~id, ~date, ~outcome,
         
           1, "2000/01/03", TRUE,
           1, "2000/01/04", FALSE,
         
           2, "2000/01/01", TRUE,
           2, "2000/01/02", FALSE,
           2, "2000/01/03", TRUE,
           2, "2000/01/04", FALSE,
         
           3, "2000/01/04", TRUE
         )

I have tried tidyverse constructs with group_by() but with no success:

input %>%
  group_by(id) %>%
  ???

Upvotes: 1

Views: 114

Answers (2)

akrun
akrun

Reputation: 887048

After grouping by 'id', filter by doing the cumsum on the logical column (TRUE -> 1 and FALSE -> 0) so that with cumsum at the first TRUE value, it changes value to 1 and continues until it hit another TRUE value, thus if we do > 0, it will only return rows from the first occurence of TRUE

library(dplyr)
input %>% 
    group_by(id) %>% 
    filter(cumsum(outcome) > 0) %>%
    ungroup

-ouptut

# A tibble: 7 x 3
     id date       outcome
  <dbl> <chr>      <lgl>  
1     1 2000/01/03 TRUE   
2     1 2000/01/04 FALSE  
3     2 2000/01/01 TRUE   
4     2 2000/01/02 FALSE  
5     2 2000/01/03 TRUE   
6     2 2000/01/04 FALSE  
7     3 2000/01/04 TRUE   

Another option is also to match which returns the index of the first TRUE value and get the sequence after that

input %>%
     group_by(id) %>%
     filter(row_number() >= match(TRUE, outcome))

data

 input <-  tribble(
  
         ~id, ~date, ~outcome,
         
           1, "2000/01/01", FALSE,
           1, "2000/01/02", FALSE,
           1, "2000/01/03", TRUE,
           1, "2000/01/04", FALSE,
         
           2, "2000/01/01", TRUE,
           2, "2000/01/02", FALSE,
           2, "2000/01/03", TRUE,
           2, "2000/01/04", FALSE,
         
           3, "2000/01/01", FALSE,
           3, "2000/01/02", FALSE,
           3, "2000/01/03", FALSE,
           3, "2000/01/04", TRUE
         )

Upvotes: 3

forhad
forhad

Reputation: 196

Use:

library(tidyverse)

input <- input %>%
  group_by(id) %>%
  arrange(id, date) %>% ## if not orderd already
  mutate(outcome2 = replace(outcome, which(outcome == T)[1]:n(), T)) %>% 
  filter(outcome2 == TRUE) %>% 
  select(-outcome2) %>% 
  ungroup()

Upvotes: 0

Related Questions