maldini425
maldini425

Reputation: 317

Thoughts on Subsetting data, using filter() function

I have a dataset in a region where schools are segregated by gender, and I am thinking of comparing gender performance within the same school, but to do that, I want to limit my data to only include schools teaching both genders. In other words, I would like to remove schools that only teach either females or males.

Below is my current code, but its giving me zero observations although includes several schools teaching both genders:

## Limit Riyadh schools only to schools teaching both genders
two_gender_schools <- filter(riyadh_scores, school_name == "",
                             gender == "male", gender == "female")

My question is, is there an efficient way to subset my data without having to manually specify each school name teaching both genders?

Upvotes: 1

Views: 82

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146224

When you give filter multiple conditions, it combines them with "and". So your code looks for rows where the school name is blank (school_name == ""), and the gender is "male", and the gender is "female".

Instead, you should group_by(school_name) and proceed from there. A couple options:

two_gender_schools_a = riyadh_schools %>%
  group_by(school_name) %>%
  filter("female" %in% gender & "male" %in% gender)
  # %in% checks anywhere in the group, not row by row

two_gender_schools_b = riyadh_schools %>%
  group_by(school_name) %>%
  filter(n_distinct(gender) > 1)
  # look for schools that have more than 1 distinct value for gender

Upvotes: 5

Related Questions