Reputation: 50
If I have the below dataset:
df<-data.frame(ID = c(1,1,1,1,2,2,2,2),
x = c("no","yes","no","no","no","yes", "no"),
y= c(1,2,3,4,1,2,3,4))
ID | x | y |
---|---|---|
1 | no | 1 |
1 | yes | 2 |
1 | no | 3 |
1 | no | 4 |
2 | no | 1 |
2 | no | 2 |
2 | yes | 3 |
2 | no | 4 |
ID is an identifier, x tells us whether or not a condition has been met, and y is a time order unique to each ID (though in a real data set it would probably be a date). How can I remove rows where the condition was not met but keep rows where the condition was met, or the event occurred after the condition was met?
A final result should look like this:
ID | x | y |
---|---|---|
1 | yes | 2 |
1 | no | 3 |
1 | no | 4 |
2 | yes | 3 |
2 | no | 4 |
Upvotes: 3
Views: 797
Reputation: 388982
You can use match
to get the index of first 'yes'
for each ID
and use it in filter
or slice
.
library(dplyr)
df %>%
group_by(ID) %>%
filter(row_number() >= match('yes', x)) %>%
ungroup
With slice
:
df %>%
group_by(ID) %>%
slice(match('yes', x):n()) %>%
ungroup
Upvotes: 1
Reputation: 101538
A base R option using ave
+ subset
subset(
df,
ave(x == "yes", ID, FUN = cumsum) > 0
)
gives
ID x y
2 1 yes 2
3 1 no 3
4 1 no 4
7 2 yes 3
8 2 no 4
A data.table
option following the same idea as above is
> setDT(df)[, .SD[cumsum(x == "yes") > 0], ID]
ID x y
1: 1 yes 2
2: 1 no 3
3: 1 no 4
4: 2 yes 3
5: 2 no 4
Upvotes: 3
Reputation: 887158
Create a logical expression with cumsum
on the 'x' value of 'yes' after grouping by 'ID'
library(dplyr)
df %>%
group_by(ID) %>%
filter(cumsum(x == 'yes') >0) %>%
ungroup
-output
# A tibble: 5 x 3
# ID x y
# <dbl> <chr> <dbl>
#1 1 yes 2
#2 1 no 3
#3 1 no 4
#4 2 yes 3
#5 2 no 4
Upvotes: 2