Jasmin
Jasmin

Reputation: 11

group data and filter groups by two columns (dplyr)

I have a question regarding using dplyr to filter a dataset.

I want to group data by its RestaurantID and then filter() all groups where the wage >= 5 in Year==1992.

For example:

I have:

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     2             92          3
     2             93          4
     3             92          5
     3             93          5

Dataset I want (where it keeps all groups that had a wage value in 1992 that was >= 5)

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     3             92          5
     3             93          5

I tried:

data %>% group_by("RestaurantID") %>% filter(any(Wage>= '5', Year =='92')) 

But this gives me all rows where wage is >=5.

Upvotes: 1

Views: 1551

Answers (2)

talat
talat

Reputation: 70266

It's ok to have a single TRUE value per ID if you want all rows of that group returned. In that case, the TRUE value is recycled to the length of that group and hence all rows are returned.

df %>% group_by(RestaurantID) %>% filter(Wage[Year == 92] >= 5)
## A tibble: 4 x 3
## Groups:   RestaurantID [2]
#  RestaurantID  Year  Wage
#         <int> <int> <int>
#1            1    92     6
#2            1    93     4
#3            3    92     5
#4            3    93     5

Please note that when comparing numbers, you shouldn't put them in quote them like '5' because that turns the numbers into characters.

Alternatively, you could modify your original approach to:

df %>% group_by(RestaurantID) %>% filter(any(Wage >= 5 & Year == 92))

which also returns the correct subset.

Upvotes: 2

akrun
akrun

Reputation: 887118

We could do this without grouping using filter

library(dplyr)
df1 %>% 
    filter(RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#3            3   92    5
#4            3   93    5

or the same logic with base R

subset(df1, RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#5            3   92    5
#6            3   93    5

Upvotes: 4

Related Questions