arnyeinstein
arnyeinstein

Reputation: 1013

Using dplyr::filter on two variables with a negation

I am trying to filter out rows in a data table that match two conditions. I tried the following, but that drops all the rows with either one of the two conditions.

filter(starwars, hair_color != "none" && eye_color != "black") 

It must be simple, but I don't see it. Help would be appreciated

Cheers Renger

Upvotes: 1

Views: 688

Answers (3)

Jens
Jens

Reputation: 2439

When doing filtering steps the AND and OR are really confusing.

You should use | instead of & to get what you want:

starwars%>% filter( hair_color != "none" | eye_color != "black")

This is due to how logical statements work. AND searches for both strings indepently and drops all where a statement is true, while for OR the statements has to be true for both statements (select only those where A and B is true). This confuses me all the time but this is how 'logic' works. It helps when you look at a venn diagram and really make an example.

In the end, I prefer the solution scoa gave as it is more intuitive.

Upvotes: 0

DataTx
DataTx

Reputation: 1869

It depends on whether you are trying to filter conditions that match both conditions or either

If you are trying to drop rows that match both conditions use:

   starwars%>% filter( ! hair_color != "none" & eye_color != "black") 

if you are trying to drop rows that have one condition OR the other use:

   starwars%>% filter( ! hair_color != "none" | eye_color != "black") 

Upvotes: 1

scoa
scoa

Reputation: 19867

I find it easier to read when you first specify the group you want to exclude, then exclude it:

filter(starwars, !(hair_color == "none" & eye_color == "black")) 

Upvotes: 2

Related Questions