Reputation: 61
I'm trying to apply clustering to a dataset. Before that i have to divide the graph into n number of clusters and i don't know how to do it.
Upvotes: 6
Views: 13617
Reputation: 1393
Suppose the edge list of your unweighted and un-directed graph was saved in file edges.txt. You can follow the steps below to cluster the nodes of the graph.
Step 1: get the embedding of each node in the graph. That means you need to get a continuous vector representation for each node. You can use graph embedding methods like node2vec, deepwalk, etc to obtain the embedding. Note that such methods preserve the structural similarity between the nodes of a graph in the vector representation (embedding space). The following example shows how you can do that.
import networkx as nx
G=nx.Graph();
G=nx.read_edgelist("edges.txt") # edges.txt contains the edge list of your graph
# help to draw https://networkx.github.io/documentation/networkx-1.9/examples/drawing/labels_and_colors.html
nx.draw(G,with_labels = True,node_color='b',node_size=500);
from node2vec import Node2Vec
# Generate walks
node2vec = Node2Vec(G, dimensions=2, walk_length=20, num_walks=10,workers=4)
# Learn embeddings
model = node2vec.fit(window=10, min_count=1)
#model.wv.most_similar('1')
model.wv.save_word2vec_format("embedding.emb") #save the embedding in file embedding.emb
Step 2: apply the clustering method. Once you get vector representation of the nodes, you can cluster the nodes based on those representations. See the example below.
from sklearn.cluster import KMeans
import numpy as np
X = np.loadtxt("embedding.emb", skiprows=1) # load the embedding of the nodes of the graph
#print(X)
# sort the embedding based on node index in the first column in X
X=X[X[:,0].argsort()];
#print(X)
Z=X[0:X.shape[0],1:X.shape[1]]; # remove the node index from X and save in Z
kmeans = KMeans(n_clusters=2, random_state=0).fit(Z) # apply kmeans on Z
labels=kmeans.labels_ # get the cluster labels of the nodes.
print(labels)
Upvotes: 9