aravind rokkam
aravind rokkam

Reputation: 61

How can I cluster a graph g created in NetworkX?

I'm trying to apply clustering to a dataset. Before that i have to divide the graph into n number of clusters and i don't know how to do it.

Upvotes: 6

Views: 13617

Answers (1)

Roohollah Etemadi
Roohollah Etemadi

Reputation: 1393

Suppose the edge list of your unweighted and un-directed graph was saved in file edges.txt. You can follow the steps below to cluster the nodes of the graph.

Step 1: get the embedding of each node in the graph. That means you need to get a continuous vector representation for each node. You can use graph embedding methods like node2vec, deepwalk, etc to obtain the embedding. Note that such methods preserve the structural similarity between the nodes of a graph in the vector representation (embedding space). The following example shows how you can do that.

import networkx as nx
G=nx.Graph();
G=nx.read_edgelist("edges.txt") # edges.txt contains the edge list of your graph

# help to draw https://networkx.github.io/documentation/networkx-1.9/examples/drawing/labels_and_colors.html
nx.draw(G,with_labels = True,node_color='b',node_size=500);

from node2vec import Node2Vec
# Generate walks
node2vec = Node2Vec(G, dimensions=2, walk_length=20, num_walks=10,workers=4)
# Learn embeddings 
model = node2vec.fit(window=10, min_count=1)
#model.wv.most_similar('1')
model.wv.save_word2vec_format("embedding.emb") #save the embedding in file embedding.emb

Step 2: apply the clustering method. Once you get vector representation of the nodes, you can cluster the nodes based on those representations. See the example below.

from sklearn.cluster import KMeans
import numpy as np


X = np.loadtxt("embedding.emb", skiprows=1) # load the embedding of the nodes of the graph
#print(X)
# sort the embedding based on node index in the first column in X
X=X[X[:,0].argsort()]; 
#print(X)
Z=X[0:X.shape[0],1:X.shape[1]]; # remove the node index from X and save in Z

kmeans = KMeans(n_clusters=2, random_state=0).fit(Z) # apply kmeans on Z
labels=kmeans.labels_  # get the cluster labels of the nodes.
print(labels)

Upvotes: 9

Related Questions