iromba
iromba

Reputation: 55

Similarity measure between graphs using NetworkX

I have two graphs A and B. They may be isomorphic, completely different, or have some similarities(few nodes are the same, or few of the nodes share the same edges).

I want to see/check how different/similar these graphs are. networkx.is_isomorphic() is a way. However, this does not say more than just true or false.

For example, the difference(A,B) function returns a new graph that contains the edges that exist in A but not in B; but it needs to have the same number of nodes.

My graphs, A and B, do not have the same number of nodes. And may have a few hundred nodes. So the algrithms will be best if not NP-Hard (Like the graph_edit_distance() function that is NP Hard).

Upvotes: 5

Views: 5385

Answers (1)

yatu
yatu

Reputation: 88236

Not sure it's quite what you're looking for, but hopefully you might find some of this useful. Let's take the following graphs G and H for instance:

l1 = [['A','C'], ['A', 'D'], ['I','F'], ['K', 'E'], ['D', 'A'], ['A', 'B'], ['C', 'D']]
l2 = [['A','B'], ['Q', 'D'], ['J','F'], ['A', 'E'], ['D', 'F'], ['X','A']]

G = nx.from_edgelist(l1)
H = nx.from_edgelist(l2)

G.nodes()
# NodeView(('A', 'C', 'D', 'I', 'F', 'K', 'E', 'B'))
H.nodes()
# NodeView(('A', 'B', 'Q', 'D', 'J', 'F', 'E', 'X'))

In order to get a similarity measure, you could probably come up with some custom definition of similarity or difference, considering the intersection of union of edges. Perhaps the Jaccard distance could be a good candidate:

def jaccard_similarity(g, h):
    i = set(g).intersection(h)
    return round(len(i) / (len(g) + len(h) - len(i)),3)

jaccard_similarity(G.edges(), H.edges())
# 0.091

What probably is also useful here is to come up with a visualisation that gives an idea of how similar and different both graphs are. We could start by using compose, which will give us the simple union of the node sets and edge sets:

GH = nx.compose(G,H)
GH.nodes()
# NodeView(('A', 'C', 'D', 'I', 'F', 'K', 'E', 'B', 'Q', 'J', 'X'))

And iterate over the composed graph's edges and nodes, and assign to these a color depending on which graph they belong to (including both at the same time too). This could also be extended to adding some attribute indicating to which graph it belongs too:

# set edge colors
edge_colors = dict()
for edge in GH.edges():
    if G.has_edge(*edge):
        if H.has_edge(*edge):
            edge_colors[edge] = 'magenta'
            continue
        edge_colors[edge] = 'lightgreen'
    elif H.has_edge(*edge):
        edge_colors[edge] = 'lightblue'

# set node colors
G_nodes = set(G.nodes())
H_nodes = set(H.nodes())
node_colors = []
for node in GH.nodes():
    if node in G_nodes:
        if node in H_nodes:
            node_colors.append('magenta')
            continue
        node_colors.append('lightgreen')
    if node in H_nodes:
        node_colors.append('lightblue')

So here the intersecting nodes and edges will have a magenta color. Otherwise they'll be green or blue if they belong to the G or H Graph respectively.

We can now plot the graph using the above dictionary/list of edge and node colors respectively. This should give quite a good way to see of both the intersecting and disjoint nodes/edges on both graphs:

pos = nx.spring_layout(GH, scale=20)
nx.draw(GH, pos, 
        nodelist=GH.nodes(),
        node_color=node_colors,
        edgelist=edge_colors.keys(), 
        edge_color=edge_colors.values(),
        node_size=800,
        width=8,alpha=0.5,
        with_labels=True)

enter image description here

Upvotes: 11

Related Questions