Reputation: 1134
I have a dataset that looks like this
ID| Name1| Name2| Name3| Name4| Name5
1 ABC DEF MNO
2 GHI JKL DEF ABC
3 ABC JKL
4 MNO JKL
5 GHI ABC DEF
6 DEF GHI MNO
7 MNO ABC JKL
I would like to have something that looks like this
ABC|DEF|GHI|JKL|MNO
ABC 3 2 3 2
DEF 3 3 1 2
GHI 2 3 1 1
JKL 3 1 1 2
MNO 2 2 1 2
Note that "ABC" is paired with "DEF" 3 times. This happens in ID 1,2, and 5
Upvotes: 0
Views: 91
Reputation: 27802
Here is a approach using data.table
for some wrangling, and igraph
for presentation and calculation of the 'network' of pairs.
sample data
library(igraph)
library(data.table)
# sample data
DT <- fread("ID Name1 Name2 Name3 Name4 Name5
1 ABC DEF MNO NA NA
2 GHI JKL DEF ABC NA
3 ABC JKL NA NA NA
4 MNO JKL NA NA NA
5 GHI ABC DEF NA NA
6 DEF GHI MNO NA NA
7 MNO ABC JKL NA NA")
code
# melt to long
DT.long <- melt(DT, id.vars = "ID", na.rm = TRUE)
# get all comnbinations by pairs of 2 values
pairs <- DT.long[, as.data.table(t(combn(value, 2))), .(ID)]
#create graph
g <- graph_from_data_frame(pairs[,2:3], directed = FALSE)
# looks like
plot(g)
# calculate adjacency matrix
as_adjacency_matrix(g)
# ABC DEF GHI JKL MNO
# ABC . 3 2 3 2
# DEF 3 . 3 1 2
# GHI 2 3 . 1 1
# JKL 3 1 1 . 2
# MNO 2 2 1 2 .
Upvotes: 3
Reputation: 66915
Here's a dplyr/tidyr approach, I'm sure not the most concise, but hopefully very legible as to what it's doing.
library(dplyr); library(tidyr)
df1 %>% # "long" data with ID + value
tidyr::pivot_longer(-ID) %>%
filter(!is.na(value)) %>%
select(-name) -> df1_long
df1_long %>% # self-join, count pairs, reshape wide
left_join(df1_long, by = "ID") %>%
filter(value.x != value.y) %>%
count(value.x, value.y) %>%
arrange(value.y) %>% # put columns in order
tidyr::pivot_wider(names_from = value.y, values_from = n) %>%
arrange(value.x) # put rows in order
## A tibble: 5 x 6
# value.x ABC DEF GHI JKL MNO
# <chr> <int> <int> <int> <int> <int>
#1 ABC NA 3 2 3 2
#2 DEF 3 NA 3 1 2
#3 GHI 2 3 NA 1 1
#4 JKL 3 1 1 NA 2
#5 MNO 2 2 1 2 NA
Upvotes: 2