How to get set differences and intersection with dplyr group piping

Question

I have the following data frame:

library(tidyverse)

dat <- tribble(
  ~category, ~status, ~content,
  1, "control", "A",
  1, "control", "Z",
  1, "treated", "A",
  1, "treated", "Z",
  1, "control", "B",
  2, "control", "C",
  2, "control", "D",
  2, "treated", "C",
  2, "treated", "F"
) %>% 
  arrange(category, status, content)


dat

That looks like this:

> dat
  category status  content
           
1        1 control A      
2        1 control B      
3        1 control Z      
4        1 treated A      
5        1 treated Z      
6        2 control C      
7        2 control D      
8        2 treated C      
9        2 treated F

What I want to do is to group it by category and then check the differences and the intersection of the content between control and treated.

The output for differences for control only:

category    differences_control_only
1           B
2           D

The output for differences for treated only:

category    differences_treated_only
1           not_available
2           F

The output for the intersection between treated and control:

  category      intersection
    1           A
    1           Z
    2           C

So at the end of the day, there will be 3 data frames as output. How can I achieve that?

In this example, the grouping is only based on one column (category) in real cases, grouping can be in multiple columns.

Ronak Shah · Accepted Answer

To get differences we can group_by category and content and select groups which has only one distinct status.

library(dplyr)
dat %>% group_by(category, content) %>% filter(n_distinct(status) == 1)

#  category status  content
#           
#1        1 control B      
#2        2 control D      
#3        2 treated F

To get intersections we can group_by category and content and select groups which has more than one distinct status in it.

dat %>%
  group_by(category, content) %>% 
  filter(n_distinct(status) >1) %>%
  distinct(category, content)

#  category content
#        
#1        1 A      
#2        1 Z      
#3        2 C

How to get set differences and intersection with dplyr group piping

Answers (2)

Related Questions