mdb_ftl
mdb_ftl

Reputation: 433

Select rows after condition in grouped data in R

I have the following data:

z <- data.frame(

  id = c("a", "a", "b", "b", "b", "c", "c", "c", "c"), 
  x = c(10, 4, 3, 7, 10, 3, 2, 6, 2)

)

I need to group by "id", then select all rows including and after any which meet the following criteria (x >=5 & x <=8)

Ideally my data should look like

id   x 
b    7
b    10
c    6
c    2  

I have tried the following without success. Any help appreciated.

z %>%
  group_by(id) %>%
  filter(row_number() >= min(which(x>= 5 & x<=8)))

Upvotes: 0

Views: 119

Answers (1)

r2evans
r2evans

Reputation: 160407

I think cumany (cumulative any) is what you need.

Basically, it does

cumany(c(F,F,T,F,F,F))
# [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

On your data:

library(dplyr)
z %>%
  group_by(id) %>%
  filter(cumany(between(x, 5, 8))) %>%
  ungroup()
# # A tibble: 4 x 2
#   id        x
#   <chr> <dbl>
# 1 b         7
# 2 b        10
# 3 c         6
# 4 c         2

You can see what this is doing by adding it as a variable, just for demonstration:

z %>%
  group_by(id) %>%
  mutate(keep = cumany(between(x, 5, 8))) %>%
  ungroup()
# # A tibble: 9 x 3
#   id        x keep 
#   <chr> <dbl> <lgl>
# 1 a        10 FALSE
# 2 a         4 FALSE
# 3 b         3 FALSE
# 4 b         7 TRUE     # every 'keep' in id='b' after this will be TRUE
# 5 b        10 TRUE 
# 6 c         3 FALSE
# 7 c         2 FALSE
# 8 c         6 TRUE     # ditto, id='c'
# 9 c         2 TRUE 

Upvotes: 1

Related Questions