Reputation: 1377
I have a data.frame:
df <- data.frame(id = rep(1:4, each = 3),x = c("A","B","C","D","E","A","A","C","D","A","C","E"))
I want to count connections inside each id: This is an output I want to get:
connections |num. of connections
A - B | 1
B - C | 1
C - D | 1
A - C | 3
A - E | 2
A - D | 2
D - E | 1
C - E | 1
How to do it in dplyr?
Upvotes: 1
Views: 104
Reputation: 886938
Using dplyr
and combn
library(dplyr)
df %>%
group_by(id) %>%
mutate(connections=c(combn(as.character(x),2,
FUN=function(x) paste(sort(x), collapse=" - ")))) %>%
group_by(connections) %>%
summarise(numConn=n())
# connections numConn
#1 A - B 1
#2 A - C 3
#3 A - D 2
#4 A - E 2
#5 B - C 1
#6 C - D 1
#7 C - E 1
#8 D - E 1
Or the same approach with data.table
library(data.table)
setDT(df)[,combn(as.character(x),2, FUN= function(x)
paste(sort(x), collapse=" - ")) , by=id][
,list(numConn=.N), by=list(connections=V1)]
# connections numConn
#1: A - B 1
#2: A - C 3
#3: B - C 1
#4: D - E 1
#5: A - D 2
#6: A - E 2
#7: C - D 1
#8: C - E 1
Upvotes: 3
Reputation: 193507
It sounds like you're just looking for the crossprod
function, which you can use like this:
crossprod(table(df))
# x
# x A B C D E
# A 4 1 3 2 2
# B 1 1 1 0 0
# C 3 1 3 1 1
# D 2 0 1 2 1
# E 2 0 1 1 2
This will get you closer to your desired output:
library(reshape2)
X <- crossprod(table(df))
X[upper.tri(X, diag = TRUE)] <- NA
melt(X, na.rm = TRUE)
# x x value
# 2 B A 1
# 3 C A 3
# 4 D A 2
# 5 E A 2
# 8 C B 1
# 9 D B 0
# 10 E B 0
# 14 D C 1
# 15 E C 1
# 20 E D 1
Upvotes: 6