Reputation: 27
I have something like this:
df <- data.frame('C1' = c("A","A","C","D","E","E","E"), 'C2' = c("B1","B2","B3","B1","B1","B2","B4"))
And I would like something like that:
C1 C2
A B1
A B2
E B1
E B2
E B4
Then I would like something like that:
B1 B2 2 (because A and E)
B1 B4 1 (because only E)
B2 B4 1 (because only E)
Upvotes: 1
Views: 64
Reputation: 14764
You could try:
library(dplyr)
df %>%
full_join(df, by = 'C1') %>%
filter(!C2.x == C2.y,
!gsub("\\D", "", C2.y) < gsub("\\D", "", C2.x)
) %>%
count(Combination = paste(C2.x, C2.y))
Output:
# A tibble: 3 x 2
Combination n
<chr> <int>
1 B1 B2 2
2 B1 B4 1
3 B2 B4 1
Upvotes: 2
Reputation: 39858
One dplyr
and tidyr
approach could be:
df %>%
group_by(C1) %>%
filter(n() >= 2) %>%
mutate(C3 = list(combn(C2, 2, FUN = paste, collapse = ","))) %>%
slice(1) %>%
unnest(C3) %>%
ungroup() %>%
count(C3)
C3 n
<chr> <int>
1 B1,B2 2
2 B1,B4 1
3 B2,B4 1
First, it filters out cases from C1 with only one observation. Then, it combines every two elements from C2 per group. Finally, it counts the number of appearances per pair.
Considering there could be duplicated C2 values per group:
df %>%
group_by(C1) %>%
filter(n() >= 2) %>%
mutate(C3 = list(combn(unique(C2), 2, FUN = paste, collapse = ","))) %>%
slice(1) %>%
unnest(C3) %>%
ungroup() %>%
count(C3)
Upvotes: 3
Reputation: 19
Try to use dplyr and data.table libraries
install.package("tidyverse")
install.package("data.table")
library(tidyverse)
library(data.table)
df2 = df %>%
filter(C1 %in% c('A', 'E'))
Upvotes: -1