user35131
user35131

Reputation: 1134

Count all possible paired combinations

I have a dataset that looks like this

ID|      Name1|        Name2|             Name3|          Name4|        Name5
 1       ABC             DEF              MNO               
 2       GHI             JKL              DEF               ABC
 3       ABC             JKL
 4       MNO             JKL
 5       GHI             ABC              DEF
 6       DEF             GHI              MNO 
 7       MNO             ABC              JKL

I would like to have something that looks like this

    ABC|DEF|GHI|JKL|MNO
ABC      3    2   3   2 
DEF   3       3   1   2   
GHI   2  3        1   1
JKL   3  1    1       2
MNO   2  2    1   2   

Note that "ABC" is paired with "DEF" 3 times. This happens in ID 1,2, and 5

Upvotes: 0

Views: 91

Answers (2)

Wimpel
Wimpel

Reputation: 27802

Here is a approach using data.table for some wrangling, and igraph for presentation and calculation of the 'network' of pairs.

sample data

library(igraph)
library(data.table)
# sample data
DT <- fread("ID      Name1        Name2             Name3          Name4        Name5
1       ABC             DEF              MNO               NA NA
2       GHI             JKL              DEF               ABC NA
3       ABC             JKL NA NA NA
4       MNO             JKL NA NA NA
5       GHI             ABC              DEF NA NA
6       DEF             GHI              MNO NA NA
7       MNO             ABC              JKL NA NA")

code

# melt to long
DT.long <- melt(DT, id.vars = "ID", na.rm = TRUE)
# get all comnbinations by pairs of 2 values
pairs <- DT.long[, as.data.table(t(combn(value, 2))), .(ID)]
#create graph
g <- graph_from_data_frame(pairs[,2:3], directed = FALSE)
# looks like
plot(g)

enter image description here

# calculate adjacency matrix
as_adjacency_matrix(g)
#     ABC DEF GHI JKL MNO
# ABC   .   3   2   3   2
# DEF   3   .   3   1   2
# GHI   2   3   .   1   1
# JKL   3   1   1   .   2
# MNO   2   2   1   2   .

Upvotes: 3

Jon Spring
Jon Spring

Reputation: 66915

Here's a dplyr/tidyr approach, I'm sure not the most concise, but hopefully very legible as to what it's doing.

library(dplyr); library(tidyr)
df1 %>%                                # "long" data with ID + value
  tidyr::pivot_longer(-ID) %>%
  filter(!is.na(value)) %>%
  select(-name) -> df1_long

df1_long %>%                           # self-join, count pairs, reshape wide
  left_join(df1_long, by = "ID") %>%
  filter(value.x != value.y) %>%
  count(value.x, value.y) %>%
  arrange(value.y) %>%                 # put columns in order
  tidyr::pivot_wider(names_from = value.y, values_from = n) %>%
  arrange(value.x)                     # put rows in order


## A tibble: 5 x 6
#  value.x   ABC   DEF   GHI   JKL   MNO
#  <chr>   <int> <int> <int> <int> <int>
#1 ABC        NA     3     2     3     2
#2 DEF         3    NA     3     1     2
#3 GHI         2     3    NA     1     1
#4 JKL         3     1     1    NA     2
#5 MNO         2     2     1     2    NA

Upvotes: 2

Related Questions