Matthew Riordan
Matthew Riordan

Reputation: 50

How can one sort data in r based on a condition being met and time ordering?

If I have the below dataset:

df<-data.frame(ID = c(1,1,1,1,2,2,2,2),
           x = c("no","yes","no","no","no","yes", "no"),
           y= c(1,2,3,4,1,2,3,4))
ID x y
1 no 1
1 yes 2
1 no 3
1 no 4
2 no 1
2 no 2
2 yes 3
2 no 4

ID is an identifier, x tells us whether or not a condition has been met, and y is a time order unique to each ID (though in a real data set it would probably be a date). How can I remove rows where the condition was not met but keep rows where the condition was met, or the event occurred after the condition was met?

A final result should look like this:

ID x y
1 yes 2
1 no 3
1 no 4
2 yes 3
2 no 4

Upvotes: 3

Views: 797

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

You can use match to get the index of first 'yes' for each ID and use it in filter or slice.

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(row_number() >= match('yes', x)) %>%
  ungroup

With slice :

df %>%
  group_by(ID) %>%
  slice(match('yes', x):n()) %>%
  ungroup

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101538

A base R option using ave + subset

subset(
  df,
  ave(x == "yes", ID, FUN = cumsum) > 0
)

gives

  ID   x y
2  1 yes 2
3  1  no 3
4  1  no 4
7  2 yes 3
8  2  no 4

A data.table option following the same idea as above is

> setDT(df)[, .SD[cumsum(x == "yes") > 0], ID]
   ID   x y
1:  1 yes 2
2:  1  no 3
3:  1  no 4
4:  2 yes 3
5:  2  no 4

Upvotes: 3

akrun
akrun

Reputation: 887158

Create a logical expression with cumsum on the 'x' value of 'yes' after grouping by 'ID'

library(dplyr)
df %>% 
   group_by(ID) %>%
   filter(cumsum(x == 'yes') >0) %>%
   ungroup

-output

# A tibble: 5 x 3
#     ID x         y
#  <dbl> <chr> <dbl>
#1     1 yes       2
#2     1 no        3
#3     1 no        4
#4     2 yes       3
#5     2 no        4

Upvotes: 2

Related Questions