Reputation: 2090
Here is an example of the input after being sorted on the date, the number of dates by id is unkown and gaps are possibe between dates:
input <- tribble(
~id, ~date, ~outcome,
1, "2000/01/01", FALSE,
1, "2000/01/02", FALSE,
1, "2000/01/03", TRUE,
1, "2000/01/04", FALSE,
2, "2000/01/01", TRUE,
2, "2000/01/02", FALSE,
2, "2000/01/03", TRUE,
2, "2000/01/04", FALSE,
3, "2000/01/01", FALSE,
3, "2000/01/02", FALSE,
3, "2000/01/03", FALSE,
3, "2000/01/04", TRUE
)
I want to keep all the lines once the outcome has been true. Here is the desired output:
output <- tribble(
~id, ~date, ~outcome,
1, "2000/01/03", TRUE,
1, "2000/01/04", FALSE,
2, "2000/01/01", TRUE,
2, "2000/01/02", FALSE,
2, "2000/01/03", TRUE,
2, "2000/01/04", FALSE,
3, "2000/01/04", TRUE
)
I have tried tidyverse constructs with group_by()
but with no success:
input %>%
group_by(id) %>%
???
Upvotes: 1
Views: 114
Reputation: 887048
After grouping by 'id', filter
by doing the cumsum
on the logical column (TRUE
-> 1 and FALSE
-> 0) so that with cumsum
at the first TRUE value, it changes value to 1 and continues until it hit another TRUE value, thus if we do > 0
, it will only return rows from the first occurence of TRUE
library(dplyr)
input %>%
group_by(id) %>%
filter(cumsum(outcome) > 0) %>%
ungroup
-ouptut
# A tibble: 7 x 3
id date outcome
<dbl> <chr> <lgl>
1 1 2000/01/03 TRUE
2 1 2000/01/04 FALSE
3 2 2000/01/01 TRUE
4 2 2000/01/02 FALSE
5 2 2000/01/03 TRUE
6 2 2000/01/04 FALSE
7 3 2000/01/04 TRUE
Another option is also to match
which returns the index of the first TRUE value and get the sequence after that
input %>%
group_by(id) %>%
filter(row_number() >= match(TRUE, outcome))
input <- tribble(
~id, ~date, ~outcome,
1, "2000/01/01", FALSE,
1, "2000/01/02", FALSE,
1, "2000/01/03", TRUE,
1, "2000/01/04", FALSE,
2, "2000/01/01", TRUE,
2, "2000/01/02", FALSE,
2, "2000/01/03", TRUE,
2, "2000/01/04", FALSE,
3, "2000/01/01", FALSE,
3, "2000/01/02", FALSE,
3, "2000/01/03", FALSE,
3, "2000/01/04", TRUE
)
Upvotes: 3
Reputation: 196
Use:
library(tidyverse)
input <- input %>%
group_by(id) %>%
arrange(id, date) %>% ## if not orderd already
mutate(outcome2 = replace(outcome, which(outcome == T)[1]:n(), T)) %>%
filter(outcome2 == TRUE) %>%
select(-outcome2) %>%
ungroup()
Upvotes: 0