apples-oranges
apples-oranges

Reputation: 987

Filter rows within groups based on multiple conditions

I have a data set where I would like to filter rows within different groups.

Given this dataframe:

group = as.factor(c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3))   
fruit = as.factor(c("apples", "apples", "apples", "oranges", 
                    "oranges", "apples", "oranges",
                    "bananas", "bananas", "oranges", "bananas")) 
hit = c(1, 0, 1, 1, 
        0, 1, 1,
        1, 0, 0, 1)

dt = data.frame(group, fruit, hit) 
dt
   group   fruit hit
      1  apples   1
      1  apples   0
      1  apples   1
      1 oranges   1
      2 oranges   0
      2  apples   1
      2 oranges   1
      3 bananas   1
      3 bananas   0
      3 oranges   0
      3 bananas   1

I would like to use the first occurrence of fruit within a group to filter the groups. But there is another condition, I would only like keep the rows of that fruit where the hit is equal to 1.

So, for group 1, apples is the first occurrence, and it has two times a positive hit, thus I I would like to keep those two rows.

The result would look like this:

  group   fruit hit
     1  apples   1
     1  apples   1
     2 oranges   1
     3 bananas   1
     3 bananas   1

I know you can filter with dplyr but I am not sure I can achieve this.

Upvotes: 2

Views: 6620

Answers (1)

akrun
akrun

Reputation: 887891

We can use dplyr. After grouping by 'group', filter the rows that have 'hit' not equal to 0 and (&) the 'fruit' as the first element of 'fruit'

library(dplyr)
dt %>%
   group_by(group) %>%
   filter(hit!=0 & fruit == first(fruit))
#   group   fruit   hit
#  <fctr>  <fctr> <dbl>
#1      1  apples     1
#2      1  apples     1
#3      2 oranges     1
#4      3 bananas     1
#5      3 bananas     1

Upvotes: 6

Related Questions