Reputation: 419
I have a dataset with two columns source
and id
. I want to check which of the ids in the id
column are present in both categories of the column source
.
Here is an example:
dd <- read.table(text="
source id
a 1
a 2
a 3
b 1
b 3
b 4",
header=TRUE, stringsAsFactors=FALSE)
For this example i want to get the ids 1 and 3 because they occur in both categories.
Is there a way to retrieve all the ids with e.g. dplyr?
Upvotes: 4
Views: 101
Reputation: 39858
One option could be:
df %>%
group_by(id) %>%
filter(n_distinct(source) == 2)
source id
<chr> <int>
1 a 1
2 a 3
3 b 1
4 b 3
If there could be n categories:
df %>%
mutate(n_dist = n_distinct(source)) %>%
group_by(id) %>%
filter(n_distinct(source) == n_dist) %>%
select(-n_dist)
Or a somehow less-tidy version:
df %>%
group_by(id) %>%
filter(n_distinct(source) == n_distinct(df$source))
Upvotes: 4