littleworth
littleworth

Reputation: 5169

How to get set differences and intersection with dplyr group piping

I have the following data frame:

library(tidyverse)

dat <- tribble(
  ~category, ~status, ~content,
  1, "control", "A",
  1, "control", "Z",
  1, "treated", "A",
  1, "treated", "Z",
  1, "control", "B",
  2, "control", "C",
  2, "control", "D",
  2, "treated", "C",
  2, "treated", "F"
) %>% 
  arrange(category, status, content)


dat

That looks like this:

> dat
  category status  content
     <dbl> <chr>   <chr>  
1        1 control A      
2        1 control B      
3        1 control Z      
4        1 treated A      
5        1 treated Z      
6        2 control C      
7        2 control D      
8        2 treated C      
9        2 treated F    

What I want to do is to group it by category and then check the differences and the intersection of the content between control and treated.

The output for differences for control only:

category    differences_control_only
1           B
2           D

The output for differences for treated only:

category    differences_treated_only
1           not_available
2           F

The output for the intersection between treated and control:

  category      intersection
    1           A
    1           Z
    2           C

So at the end of the day, there will be 3 data frames as output. How can I achieve that?

In this example, the grouping is only based on one column (category) in real cases, grouping can be in multiple columns.

Upvotes: 0

Views: 686

Answers (2)

akrun
akrun

Reputation: 887028

We can use data.table

library(data.table)
setDT(dat)[,  .SD[uniqueN(status) > 1], .(category, content)]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388862

To get differences we can group_by category and content and select groups which has only one distinct status.

library(dplyr)
dat %>% group_by(category, content) %>% filter(n_distinct(status) == 1)

#  category status  content
#     <dbl> <chr>   <chr>  
#1        1 control B      
#2        2 control D      
#3        2 treated F         

To get intersections we can group_by category and content and select groups which has more than one distinct status in it.

dat %>%
  group_by(category, content) %>% 
  filter(n_distinct(status) >1) %>%
  distinct(category, content)

#  category content
#     <dbl> <chr>  
#1        1 A      
#2        1 Z      
#3        2 C      

Upvotes: 1

Related Questions