Reputation: 5169
I have the following data frame:
library(tidyverse)
dat <- tribble(
~category, ~status, ~content,
1, "control", "A",
1, "control", "Z",
1, "treated", "A",
1, "treated", "Z",
1, "control", "B",
2, "control", "C",
2, "control", "D",
2, "treated", "C",
2, "treated", "F"
) %>%
arrange(category, status, content)
dat
That looks like this:
> dat
category status content
<dbl> <chr> <chr>
1 1 control A
2 1 control B
3 1 control Z
4 1 treated A
5 1 treated Z
6 2 control C
7 2 control D
8 2 treated C
9 2 treated F
What I want to do is to group it by category
and then check the differences
and the intersection of the content
between control and treated.
The output for differences for control
only:
category differences_control_only
1 B
2 D
The output for differences for treated
only:
category differences_treated_only
1 not_available
2 F
The output for the intersection between treated
and control
:
category intersection
1 A
1 Z
2 C
So at the end of the day, there will be 3 data frames as output. How can I achieve that?
In this example, the grouping is only based on one column (category
)
in real cases, grouping can be in multiple columns.
Upvotes: 0
Views: 686
Reputation: 887028
We can use data.table
library(data.table)
setDT(dat)[, .SD[uniqueN(status) > 1], .(category, content)]
Upvotes: 1
Reputation: 388862
To get differences we can group_by
category
and content
and select groups which has only one distinct status
.
library(dplyr)
dat %>% group_by(category, content) %>% filter(n_distinct(status) == 1)
# category status content
# <dbl> <chr> <chr>
#1 1 control B
#2 2 control D
#3 2 treated F
To get intersections we can group_by
category
and content
and select groups which has more than one distinct status
in it.
dat %>%
group_by(category, content) %>%
filter(n_distinct(status) >1) %>%
distinct(category, content)
# category content
# <dbl> <chr>
#1 1 A
#2 1 Z
#3 2 C
Upvotes: 1