Count all possible paired combinations

Question

I have a dataset that looks like this

ID|      Name1|        Name2|             Name3|          Name4|        Name5
 1       ABC             DEF              MNO               
 2       GHI             JKL              DEF               ABC
 3       ABC             JKL
 4       MNO             JKL
 5       GHI             ABC              DEF
 6       DEF             GHI              MNO 
 7       MNO             ABC              JKL

I would like to have something that looks like this

    ABC|DEF|GHI|JKL|MNO
ABC      3    2   3   2 
DEF   3       3   1   2   
GHI   2  3        1   1
JKL   3  1    1       2
MNO   2  2    1   2

Note that "ABC" is paired with "DEF" 3 times. This happens in ID 1,2, and 5

Jon Spring · Accepted Answer

Here's a dplyr/tidyr approach, I'm sure not the most concise, but hopefully very legible as to what it's doing.

library(dplyr); library(tidyr)
df1 %>%                                # "long" data with ID + value
  tidyr::pivot_longer(-ID) %>%
  filter(!is.na(value)) %>%
  select(-name) -> df1_long

df1_long %>%                           # self-join, count pairs, reshape wide
  left_join(df1_long, by = "ID") %>%
  filter(value.x != value.y) %>%
  count(value.x, value.y) %>%
  arrange(value.y) %>%                 # put columns in order
  tidyr::pivot_wider(names_from = value.y, values_from = n) %>%
  arrange(value.x)                     # put rows in order


## A tibble: 5 x 6
#  value.x   ABC   DEF   GHI   JKL   MNO
#         
#1 ABC        NA     3     2     3     2
#2 DEF         3    NA     3     1     2
#3 GHI         2     3    NA     1     1
#4 JKL         3     1     1    NA     2
#5 MNO         2     2     1     2    NA

Count all possible paired combinations

Answers (2)

Related Questions