Check for occurence in both categories

Question

I have a dataset with two columns source and id. I want to check which of the ids in the id column are present in both categories of the column source.

Here is an example:

dd <- read.table(text="
source id
a      1
a      2
a      3
b      1
b      3
b      4", 
header=TRUE, stringsAsFactors=FALSE)

For this example i want to get the ids 1 and 3 because they occur in both categories.

Is there a way to retrieve all the ids with e.g. dplyr?

tmfmnk · Accepted Answer

One option could be:

df %>%
 group_by(id) %>%
 filter(n_distinct(source) == 2)

  source    id
    
1 a          1
2 a          3
3 b          1
4 b          3

If there could be n categories:

df %>%
 mutate(n_dist = n_distinct(source)) %>%
 group_by(id) %>%
 filter(n_distinct(source) == n_dist) %>%
 select(-n_dist)

Or a somehow less-tidy version:

df %>%
 group_by(id) %>%
 filter(n_distinct(source) == n_distinct(df$source))

Check for occurence in both categories

Answers (1)

Related Questions