Reputation: 11
I have a question regarding using dplyr
to filter a dataset.
I want to group data by its RestaurantID
and then filter()
all groups where the wage >= 5
in Year==1992
.
For example:
I have:
RestaurantID Year Wage
1 92 6
1 93 4
2 92 3
2 93 4
3 92 5
3 93 5
Dataset I want (where it keeps all groups that had a wage value in 1992 that was >= 5)
RestaurantID Year Wage
1 92 6
1 93 4
3 92 5
3 93 5
I tried:
data %>% group_by("RestaurantID") %>% filter(any(Wage>= '5', Year =='92'))
But this gives me all rows where wage is >=5
.
Upvotes: 1
Views: 1551
Reputation: 70266
It's ok to have a single TRUE value per ID if you want all rows of that group returned. In that case, the TRUE value is recycled to the length of that group and hence all rows are returned.
df %>% group_by(RestaurantID) %>% filter(Wage[Year == 92] >= 5)
## A tibble: 4 x 3
## Groups: RestaurantID [2]
# RestaurantID Year Wage
# <int> <int> <int>
#1 1 92 6
#2 1 93 4
#3 3 92 5
#4 3 93 5
Please note that when comparing numbers, you shouldn't put them in quote them like '5' because that turns the numbers into characters.
Alternatively, you could modify your original approach to:
df %>% group_by(RestaurantID) %>% filter(any(Wage >= 5 & Year == 92))
which also returns the correct subset.
Upvotes: 2
Reputation: 887118
We could do this without grouping using filter
library(dplyr)
df1 %>%
filter(RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
# RestaurantID Year Wage
#1 1 92 6
#2 1 93 4
#3 3 92 5
#4 3 93 5
or the same logic with base R
subset(df1, RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
# RestaurantID Year Wage
#1 1 92 6
#2 1 93 4
#5 3 92 5
#6 3 93 5
Upvotes: 4