Reputation: 26218
Lets assume, my data is like
group_id col1
1 1 A,B
2 1 B,C
3 2 A,C
4 2 B,D
5 3 A,D
6 3 A,B,C,D
I would like to summarise/mutate col1 where its elements are intersected within same group (over group_id). My required output is like (if summarising)
group_id col1
1 1 B
2 2 <NA>
3 3 A,D
or like this (if mutating)
group_id col1
1 1 B
2 1 B
3 2 <NA>
4 2 <NA>
5 3 A,D
6 3 A,D
I can easily create a union by using function toString
but scratching my head for how to have common elements in the output. Basically intersect
requires at least two arguments and is therefore not working here.
dput(df) is as under
df <- structure(list(group_id = c(1L, 1L, 2L, 2L, 3L, 3L), col1 = c("A,B",
"B,C", "A,C", "B,D", "A,D", "A,B,C,D")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
Upvotes: 2
Views: 864
Reputation: 39858
One option using dplyr
and tidyr
could be:
df %>%
separate_rows(col1) %>%
count(group_id, col1) %>%
group_by(group_id) %>%
summarise(col1 = if_else(all(n == 1), NA_character_, paste(col1[n == 2], collapse = ",")))
group_id col1
<int> <chr>
1 1 B
2 2 <NA>
3 3 A,D
Upvotes: 1
Reputation: 11584
Would this work:
library(dplyr)
library(tidyr)
df %>% separate_rows(col1) %>%
group_by(group_id, col1) %>% filter(n()>1) %>%
distinct() %>% group_by(group_id) %>% summarise(col1 = toString(col1)) %>%
right_join(df %>% select(group_id) %>% distinct()) %>%
arrange(group_id)
`summarise()` ungrouping output (override with `.groups` argument)
Joining, by = "group_id"
# A tibble: 3 x 2
group_id col1
<int> <chr>
1 1 B
2 2 NA
3 3 A, D
Upvotes: 1
Reputation: 388982
You can split col1
on comma and use Reduce
+ intersect
to get common values in each group_id
.
library(dplyr)
df %>%
group_by(group_id) %>%
summarise(col1 = toString(Reduce(intersect, strsplit(col1, ','))))
# group_id col1
#* <int> <chr>
#1 1 "B"
#2 2 ""
#3 3 "A, D"
Upvotes: 1