game_of_lies
game_of_lies

Reputation: 55

R: dplyr How to group by then filter rows based on the condition of each group's first row

I have a simple data frame such as

df <- data.frame(x=c(1,1,1,1,2,2,2,3,3,3),
                 y=c('a','b','a','c','e','d','e','a','f','c'))

enter image description here

I want to group by x, then if the first row of each x-groups has y == 'a', then get only rows that have y == 'a' | y == 'c'

So I expect the outcome would have row 1, 3, 4, 8, 10

Thank you very much.

Upvotes: 4

Views: 1865

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

Here is another dplyr option

> df %>%
+   filter(y %in% c("a", "c") & ave(y == "a", x, FUN = first))
  x y
1 1 a
2 1 a
3 1 c
4 3 a
5 3 c

Upvotes: 0

akrun
akrun

Reputation: 887118

After grouping by 'x', create an & condition - 1) check whether the first value of 'y' is 'a', 2) condition that checks for values 'a', 'c' in the column

library(dplyr)
df %>%
   group_by(x) %>%
   filter('a' == first(y), y %in% c('a', 'c')) %>%
   ungroup

-output

# A tibble: 5 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     1 a    
3     1 c    
4     3 a    
5     3 c 

If we have additional rules, create a named list where the names will be expected first values of 'y' and the vector of values to be filtered, then extract the list element based on the first value of the 'y' and use that vector in the logical expression with %in%

df %>%
    group_by(x) %>%
    filter(y %in% list(a = c('a', 'c'), e = 'e')[[first(y)]]) %>%
    ungroup

-output

# A tibble: 7 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     1 a    
3     1 c    
4     2 e    
5     2 e    
6     3 a    
7     3 c   

Upvotes: 4

Related Questions