Reputation: 23
Very new to R. I have a large text-based df that I would like to perform some checks in. I want to check which variables in one vector ('colour') have two distinct variables ('a' and 'b') in another vector. This should be an AND not an OR type query. The df looks like this
Data
structure(list(colour = c("blue", "blue", "red", "red", "red",
"purple", "purple"), letter = c("a", "c", "a", "m", "b", "a",
"b")), class = "data.frame", row.names = c(NA, -7L))
colour letter
blue a
blue c
red a
red m
red b
purple a
purple b
I think the best way to do this is by subsetting, such that I get a new df ('df2') with the relevant data, which should look like this:
colour letter
red a
red b
purple a
purple b
I tried the following dplyr commands, but I don't get the right results ('blue a' is still there).
df2<-df%>%group_by(colour)%>%filter(letter %in% c('a','b'))
I'd appreciate any help I can get!
Upvotes: 1
Views: 1155
Reputation: 145755
letter %in% c('a', 'b')
checks each letter to see whether it's in the set {a
, b
} (that is, it will return true for each letter that is a
or b
), and keeps them. What you want to do is check that there is both an a
in the group and a b
in the group:
df %>%
group_by(colour) %>%
filter('a' %in% letter & 'b' %in% letter)
## or, if you have more than a couple letters (maybe a vector of letters)
df %>%
group_by(colour) %>%
filter(all(c('a', 'b') %in% letter))
It's not clear from your text or example what should happen if a group contains a
, b
, and another letter, say c
. The code above will keep the whole group as long as there is an a
and a b
in it.
If you want to keep only the a
and b
letters of the group (in the case there are more), keep the filter condition you had as well:
... filter(all(c('a', 'b') %in% letter), letter %in% c('a', 'b'))
If you want to keep only groups that have a
and b
and no other letters, then I would do this:
... filter(all(c('a', 'b') %in% letter) & n_distinct(letter) == 2)
## another alternative
... filter(all(c('a', 'b') %in% letter) & all(letter %in% c('a', 'b')))
Upvotes: 1