freakimkaefig
freakimkaefig

Reputation: 419

Check for occurence in both categories

I have a dataset with two columns source and id. I want to check which of the ids in the id column are present in both categories of the column source.

Here is an example:

dd <- read.table(text="
source id
a      1
a      2
a      3
b      1
b      3
b      4", 
header=TRUE, stringsAsFactors=FALSE)

For this example i want to get the ids 1 and 3 because they occur in both categories.

Is there a way to retrieve all the ids with e.g. dplyr?

Upvotes: 4

Views: 101

Answers (1)

tmfmnk
tmfmnk

Reputation: 39858

One option could be:

df %>%
 group_by(id) %>%
 filter(n_distinct(source) == 2)

  source    id
  <chr>  <int>
1 a          1
2 a          3
3 b          1
4 b          3

If there could be n categories:

df %>%
 mutate(n_dist = n_distinct(source)) %>%
 group_by(id) %>%
 filter(n_distinct(source) == n_dist) %>%
 select(-n_dist)

Or a somehow less-tidy version:

df %>%
 group_by(id) %>%
 filter(n_distinct(source) == n_distinct(df$source))

Upvotes: 4

Related Questions