Chris T.
Chris T.

Reputation: 1801

Quickly calculate the number of shared neighbor between any pair of nodes from an adjacency matrix

I'm wondering if there's a way to quickly calculate the number of shared neighbors between any pairs of nodes (i.e., the number of nodes connected to both node i and j) from an adjacency matrix (like the one below) and then return the output in matrix format?

I've referenced these following posts, Find common neighbors of selected vertices Counting the number of neighbors in common between two vertices in R Calculate number of common neighbors in igraph R

but can't seem to find many clues to do what I want to achieve. I was told that this can be directly calculated as adjm'*adjm, but I am not sure if this makes sense. It will be really appreciated if someone could shed some light on this.

# create an adj. matrix
adjm <- matrix(sample(0:1, 100, replace=TRUE, prob=c(0.6,0.4)), nc=10)

# set diagonal element to 0
diag(adjm) <- 0

# making it symmetric 
adjm[lower.tri(adjm)] = t(adjm)[lower.tri(adjm)]

Upvotes: 3

Views: 1386

Answers (3)

Szabolcs
Szabolcs

Reputation: 25703

If the graph is undirected, then the matrix of the number of shared neighbours is A.A, where I used . to denote the matrix product. As others note, the correct operator in R is %*%. Transposition is not necessary, since in the undirected case, the adjacency matrix is symmetric: A' = A.

If the graph is directed, let's assume that A_{ij} denotes the number of connections from i to j (this is igraph's interpretation).

Then the i,j element of A'.A gives the number of nodes that point to both i and j (i <- v -> j), and A.A' gives the number of nodes that both i and j point to (i -> v <- j). These quantities are sometimes called cocitation and bibliographic coupling, respectively, and can be computed using cocitation() and bibcoupling() in igraph. In the undirected case, both of these functions return the same, thus you may use either.

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101373

I think @G5W's solution is super concise already, highly recommended!

Below is a graph theory way to solve it, maybe not that efficient:

> nb <- neighborhood(g,mindist = 1)

> outer(nb, nb, FUN = Vectorize(function(a, b) length(intersect(a, b))))
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    6    1    1    4    2    2    2    5    3     1
 [2,]    1    3    2    0    1    3    0    1    2     1
 [3,]    1    2    4    1    2    4    1    1    3     3
 [4,]    4    0    1    4    2    1    1    3    2     1
 [5,]    2    1    2    2    4    2    0    1    3     2
 [6,]    2    3    4    1    2    5    1    2    3     3
 [7,]    2    0    1    1    0    1    2    2    1     1
 [8,]    5    1    1    3    1    2    2    5    3     1
 [9,]    3    2    3    2    3    3    1    3    7     3
[10,]    1    1    3    1    2    3    1    1    3     4

> t(adjm) %*% adjm
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    6    1    1    4    2    2    2    5    3     1
 [2,]    1    3    2    0    1    3    0    1    2     1
 [3,]    1    2    4    1    2    4    1    1    3     3
 [4,]    4    0    1    4    2    1    1    3    2     1
 [5,]    2    1    2    2    4    2    0    1    3     2
 [6,]    2    3    4    1    2    5    1    2    3     3
 [7,]    2    0    1    1    0    1    2    2    1     1
 [8,]    5    1    1    3    1    2    2    5    3     1
 [9,]    3    2    3    2    3    3    1    3    7     3
[10,]    1    1    3    1    2    3    1    1    3     4

Upvotes: 1

G5W
G5W

Reputation: 37641

Yes, you can get the number of shared neighbors by computing the matrix product of adjm' and adjm. Since you are using R, adjm'*adjm means the component-wise product of the matrices. We want the matrix product, so you need to use %*%. I will use that below.

To simplify the notation, I will denote adjm = A where A[i,j] is 1 if there is a link between nodes i and j (they are neighbors) and A[i,j] = 0 otherwise.

Let's compute t(A) %*% A.

The i-jth coordinate of t(A) %*% A is

(t(A) %*% A)[i,j] = 
sum(t(A)[i,k] * A[k,j])  =
sum(A[k,i] * A[k,j])

All of the products in the sum are either 0 or 1. If both A[k,i]=1 and A[k,j]=1, the product is 1,
otherwise it is zero. So (t(A)%*%A)[i,j] is equal to the number of different k's for which both A[k,i]=1 and A[k,j]=1. But A[k,i]=1 means k is a neighbor of i and A[k,j]=1 means k is a neighbor of j, so (t(A)%*%A)[i,j] is equal to the number of different k's for which k is a neighbor of both i and j.

Let's try it out on your example. In order to make the results reproducible, I set the random.seed.

library(igraph)

## For reproducibility
set.seed(1492)

# create an adj. matrix
adjm <- matrix(sample(0:1, 100, replace=TRUE, prob=c(0.6,0.4)), nc=10)

# set diagonal element to 0
diag(adjm) <- 0

# making it symmetric 
adjm[lower.tri(adjm)] = t(adjm)[lower.tri(adjm)]

Shared = t(adjm) %*% adjm

g = graph_from_adjacency_matrix(adjm, mode = "undirected")
plot(g)

Example graph

Notice for example that Shared[1,4] = 4. That is because nodes 1 and 4 have four shared neighbors,
Nodes 2,3,6 and 9. Shared[5,7]=0 because nodes 5 and 7 have no neighbors in common.

Upvotes: 3

Related Questions