Reputation: 343
I'm trying to cluster some unlabeleded unweighted undirected graphs. I'd like to calculate some scalar features for each of them to build an embedding vector and then use a clustering algorithm to see if they can be distinguished in an unsupervised way. However, most of the features i was planning to use (closeness/betweenness centrality, clustering coefficient) are quite hard to compute and i don't have access to any significant hardware.
What are some representative features of the graphs that have a lower computational complexity to be extracted? Say, around O(n)
or O(m)
For this task i'm using the python library networkx
Upvotes: 2
Views: 34
Reputation: 601
If you're using networkx
, I'd suggest looking through their list of algorithms to find some ideas. Of course, not all of them will run in linear or even polynomial time, but polynomial time may be fast enough for you (depending on the size of the graphs you want to analyze), and if you're not sure about runtime, you can always run a few empirical tests to see how runtime scales with graph size.
Here's a short list of features that can be computed quickly to get you started:
import networkx as nx
def get_features(g: nx.Graph) -> list[float]:
return [
g.number_of_nodes(),
g.number_of_edges(),
nx.density(g),
nx.number_of_isolates(g),
# greedy approximation of chromatic number
len(set(nx.coloring.greedy_color(g, strategy="largest_first").values())),
]
graphs = [
nx.cycle_graph(8),
nx.empty_graph(8),
nx.star_graph(7),
nx.path_graph(8),
nx.hypercube_graph(3),
]
for g in graphs:
print(get_features(g))
# [8, 8, 0.2857142857142857, 0, 2]
# [8, 0, 0, 8, 1]
# [8, 7, 0.25, 0, 2]
# [8, 7, 0.25, 0, 2]
# [8, 12, 0.42857142857142855, 0, 2]
Upvotes: 2