Learner_seeker
Learner_seeker

Reputation: 544

Detect Networks/communities and assign IDs

i have a table which shows two columns having IDs. The below table implies that ID in column 1 is related to ID in column 2. The schema of table is such that we have both ( say IDs are A and B , both are related. Then entry will appear twice , once as A to B and B to A ) , sample table below :

ID.1    ID.2
 A       B 
 A       C
 B       C
 C       B
 C       A
 B       A
 D       E
 E       F
 F       E
 D       F
 E       D
 F       D

( e.g. for A,B,C we see A & B are related , A & C are related , B & C are related - i tag all of them in one house and give a unique id )

Output

  ID.1    ID.2      HouseID
     A       B       X1
     A       C       X1
     B       C       X1
     C       B       X1
     C       A       X1
     B       A       X1
     D       E       X2
     E       F       X2
     F       E       X2
     D       F       X2 
     D       F       X2
     E       D       X2
     F       D       X2

How do i get the above in R ? what if i add a transitive logic for example A is related to B and A is related to C , Hence B also must know C ?

Upvotes: 0

Views: 178

Answers (1)

G5W
G5W

Reputation: 37661

As I understand the question, @Scarabee had it right. The answer depends on maximal cliques, but the bounty shows that the OP did not consider that a full answer. This answer pushes that through to assigning the HouseID.

library(igraph)

## Your sample data
Edges1 = read.table(text="ID.1    ID.2
 A       B 
 A       C
 B       C
 C       B
 C       A
 B       A
 D       E
 E       F
 F       E
 D       F
 E       D
 F       D", 
header=TRUE, stringsAsFactors=FALSE)

g1 = graph_from_edgelist(as.matrix(Edges1), directed=FALSE)
plot(g1)
MC1 = max_cliques(g1)
MC1
[[1]]
+ 3/6 vertices, named, from 8123133:
[1] A B C

[[2]]
+ 3/6 vertices, named, from 8123133:
[1] D E F

This gives the maximal cliques (the houses), but we need to construct the HouseID variable.

Edges1$HouseID = apply(Edges1, 1, 
    function(e) 
        which(sapply(MC1, function(mc) all(e %in% names(unclass(mc))))))
Edges1
   ID.1 ID.2 HouseID
1     A    B       1
2     A    C       1
3     B    C       1
4     C    B       1
5     C    A       1
6     B    A       1
7     D    E       2
8     E    F       2
9     F    E       2
10    D    F       2
11    E    D       2
12    F    D       2

The outer apply loops through the edges. The inner sapply checks which clique (house) contains both nodes from the edge.

This provides the structure that the question asked for. But as @Scarabee pointed out, a node may belong to more than one maximal clique (house). That is not exactly a problem as the requested structure assigns the HouseID to edges. Here is an example with a node that belongs to two houses.

Edges3 = read.table(text="ID.1    ID.2
 A       B 
 A       C
 B       C
 D       E
 D       A
 E       A", 
header=TRUE, stringsAsFactors=FALSE)

g3 = graph_from_edgelist(as.matrix(Edges3), directed=FALSE)
plot(g3)
MC3 = max_cliques(g3)

Edges3$HouseID = apply(Edges3, 1, 
    function(e) 
        which(sapply(MC3, function(mc) all(e %in% names(unclass(mc))))))
Edges3
  ID.1 ID.2 HouseID
1    A    B       2
2    A    C       2
3    B    C       2
4    D    E       1
5    D    A       1
6    E    A       1

Node A is in more than one maximal clique

In this case, We can still assign a HouseID to each edge, even though the node A is in two different Houses. Notice that the edge A-B has HouseID = 2, but edge D-A has HouseD = 1. The HouseID is a property of the edge, not the node.

However, there is still a problem. It is possible for both ends of an edge to belong to two houses and one cannot assign a single house to the edge.

Edges4 = read.table(text="ID.1    ID.2
 A       B 
 A       C
 B       C
 D       A
 D       B", 
header=TRUE, stringsAsFactors=FALSE)

g4 = graph_from_edgelist(as.matrix(Edges4), directed=FALSE)
plot(g4)
MC4 = max_cliques(g4)
MC4
[[1]]
+ 3/4 vertices, named, from fbd5929:
[1] A B C

[[2]]
+ 3/4 vertices, named, from fbd5929:
[1] A B D

Edge A-B is in two maximal cliques

The edge A-B belongs to two maximal cliques. As @Scarabee said, the question is not actually well-defined for all graphs.

Upvotes: 2

Related Questions