Ben
Ben

Reputation: 123

Generating distinct groups of nodes in a network

The Issue

Given the following network of nodes and edges, I would like to derive all possible groupings of nodes where all nodes within a group are connected to all other nodes within that group via an edge.

Network

In this network...

In other words, the rules are as follows:

  1. All members of a group must be connected to all other members of that group directly via an edge.

  2. An object may be a member of multiple groups.

  3. No redundant groups. If a group can fit within a larger group, it is not a group. (Ex. 'B' and 'C' do not comprise a valid group on their own because they both fit within the larger group of 'B', 'C', and 'F'). An object may only be in a singular group (ex. A-A) if it belongs to no other groups.


I have represented the network above as a dataframe where each row represent pairs of nodes (x1 and x2) bound by an edge:

x1 <- c("A", "B", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "F", "F", "F")
x2 <- c("A", "B", "C", "D", "F", "B", "C", "F", "B", "D", "E", "D", "E", "B", "C", "F")

df <- data.frame(x1, x2)

Given this df, I would like to derive the following valid groups (provided in visual as well as data frame form):

enter image description here

     1    2    3    4   
1    A    B    B    D       
2   NULL  C    D    E 
3   NULL  F   NULL NULL 

**Note: the order of groups/group names is irrelevant.


What I have tried

I have attempted to loop through a list of each unique node name in column x1 of df to identify all nodes that each node is connected to. I then use this information to generate group rosters. However, these group rosters are sometimes invalidated by violating rule 1. Here is what I have thus far...

n <- nrow(as.data.frame(unique(df$x1)))

RosterGuide <- as.data.frame(matrix(nrow = n , ncol = 1)) 
RosterGuide$V1 <- seq.int(nrow(RosterGuide))
RosterGuide$Object <- (unique(df$x1))
colnames(RosterGuide) <- c("V1","Object")
groups_frame <- matrix(, ncol= length(n), nrow = length(n))

for (loopItem in 1:nrow(RosterGuide)) {

object <- subset(RosterGuide$Object, RosterGuide$V1 == loopItem)
group <- as.data.frame(subset(df$x2, df$x1 == object))

groups_frame <- cbind.fill(group, groups_frame, fill = "NULL")
}

Groups <- as.data.frame(groups_frame)
Groups <- subset(Groups, select = - c(object))
colnames(Groups) <- RosterGuide$V1

... this loop yields the data frame 'Groups'...

     1    2    3    4   5    6
1    B    D    B    B   B    A
2    C    E    D    C   C NULL
3    F NULL    E    F   D NULL
4 NULL NULL NULL NULL   F NULL

This is where I am at. You can see that group 3 violates the first rule because 'B' and 'E' are not directly connected by an edge, group 5 violates the first rule because 'F' and 'D' and 'F' and 'C' are not directly connected via an edge, and group 4 violates the third rule because it is a duplication of group 1 (I am less worried about 3rd rule violations, I can solve that one easily).

I am at a loss in trying to get from the data frame 'Groups' to the valid output I suggested above in a way that is universal to any data frame like df (2 columns, infinite rows) that describes the nodes and edges of a network of any size.

Upvotes: 7

Views: 1290

Answers (1)

Henrik
Henrik

Reputation: 67778

Convert your data frame representation of the network to an igraph object. Use max_cliques to find "all the maximal cliques in an undirected graph".

library(igraph)
g <- graph_from_data_frame(df, directed = FALSE)
mc <- max_cliques(g, min = 1)
mc
# [[1]]
# + 1/6 vertex, named, from eb2aa45:
# [1] A
# 
# [[2]]
# + 2/6 vertices, named, from eb2aa45:
# [1] D E
# 
# [[3]]
# + 2/6 vertices, named, from eb2aa45:
# [1] D B
# 
# [[4]]
# + 3/6 vertices, named, from eb2aa45:
# [1] B F C

Grab the names of the vertices of the maximal cliques. Create corresponding group numbers and convert to data frame:

nm <- lapply(mc, attr, "names")
d <- data.frame(g = rep(seq_len(length(nm)), lengths(nm)), vert = unlist(nm))
d
#   g vert
# 1 1    A
# 2 2    D
# 3 2    E
# 4 3    D
# 5 3    B
# 6 4    B
# 7 4    F
# 8 4    C

simplify graph, plot it, highlight vertex groups using the list above in mark.groups. Prettify according to taste (see ?plot.igraph).

plot(simplify(g), mark.groups = nm, mark.border = "red", mark.col = NA)

enter image description here

Upvotes: 6

Related Questions