Reputation: 987
I have a data set where I would like to filter rows within different groups.
Given this dataframe:
group = as.factor(c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3))
fruit = as.factor(c("apples", "apples", "apples", "oranges",
"oranges", "apples", "oranges",
"bananas", "bananas", "oranges", "bananas"))
hit = c(1, 0, 1, 1,
0, 1, 1,
1, 0, 0, 1)
dt = data.frame(group, fruit, hit)
dt
group fruit hit
1 apples 1
1 apples 0
1 apples 1
1 oranges 1
2 oranges 0
2 apples 1
2 oranges 1
3 bananas 1
3 bananas 0
3 oranges 0
3 bananas 1
I would like to use the first occurrence of fruit
within a group to filter the groups. But there is another condition, I would only like keep the rows of that fruit where the hit
is equal to 1
.
So, for group 1
, apples
is the first occurrence, and it has two times a positive hit, thus I I would like to keep those two rows.
The result would look like this:
group fruit hit
1 apples 1
1 apples 1
2 oranges 1
3 bananas 1
3 bananas 1
I know you can filter with dplyr
but I am not sure I can achieve this.
Upvotes: 2
Views: 6620
Reputation: 887891
We can use dplyr
. After grouping by 'group', filter
the rows that have 'hit' not equal to 0 and (&
) the 'fruit' as the first
element of 'fruit'
library(dplyr)
dt %>%
group_by(group) %>%
filter(hit!=0 & fruit == first(fruit))
# group fruit hit
# <fctr> <fctr> <dbl>
#1 1 apples 1
#2 1 apples 1
#3 2 oranges 1
#4 3 bananas 1
#5 3 bananas 1
Upvotes: 6