bernie2436
bernie2436

Reputation: 23901

"Agglomerative" clustering of a graph based on node weight in network X?

I have a very large connected graph (millions of nodes). Each edge has a weight -- identifying the proximity of the connected nodes. I want to find "clusters" in the graph (sets of nodes that are very close together). For instance, if the nodes were every city in the US and the edges were distance between the cities -- the clusters might be {Dallas, Houston, Fort Worth} and {New York, Bridgeport, Jersey City, Trenton}.

The clusters don't have to be the same size and not all nodes have to be in a cluster. Instead, clusters need to have some average minimum weight, W which is equal to (sum of weights in cluster) / (number of edges in cluster).

I am most comfortable in Python, and NetworkX seems to be the standard tool for this

It seems like this would not be too hard to program, although not particularly efficiently. Is there a name for the algorithm I am describing? Is there an implementation in NetworkX already?

Upvotes: 5

Views: 1968

Answers (1)

epsi1on
epsi1on

Reputation: 571

I know some graph partitioning algorithms that their goal is to make all parts with approximate same size and minimum edge cut as possible, but as you described you do not need such an algorithm. Anyways i think your problem is NP complete like many other graph partitioning problems. Maybe there be some algorithms which specifically work fine for your problem (and i think there are but i do not know them) but i think you can still find good and acceptable solutions with slight changing the some algorithms which are originally for finding minimum edge cut with same components size.

For example see this. i think you can use multilevel k-way partitioning with some changes. For example in coarsening phase, you can use Light Edge Matching. Consider a situation when in coarsening phase you've matched A and B into one group and also C and D into another group. weight of edge between these two groups is sum of edges of its members to each other e.g. W=Wac+Wad+Wbc+Wbd where W is edge weight, Wac is edge weight between A and C an so on. I also think that considering average of Wac, Wad, Wbc and Wbd instead of sum of them also worth a try.

From my experience this algorithm is very fast and i am not sure you be able to find precoded library in python to make changes into it.

Upvotes: 1

Related Questions