Reputation: 31
I have a 2D numpy array in which each row contains 2 integers. I want to find all the groups of elements belonging to rows which share common elements (a.k.a connected components of a graph from the edgelist array). For example, for the array:
[[ 0 4]
[ 0 7]
[ 1 2]
[ 1 13]
[ 2 1]
[ 2 9]
[ 3 14]
[ 3 16]
[ 4 0]
[ 4 5]
[ 5 4]
[ 5 6]
[ 6 5]
[ 6 7]
[ 7 0]
[ 7 6]]
would contain the groups
[[ 0 4 5 6 7]
[ 1 2 13 9]
[ 3 14 16]]
Upvotes: 3
Views: 270
Reputation: 5939
In terms of graph theory you need to create a graph from an array of edges and then find connected components of this graph. Pure numpy
based solution seems too hard for me but you still can make it C level using igraph
which is written in C (unlike networkx
which is pure Python). You need to install python-igraph
first.
igraph.Graph.clusters()
method returns a special instance of igraph.clustering.VertexClustering
class which can be converted to list
:
import igraph
arr = np.array([[0, 4], [0, 7], [1, 2], [1, 9], [2, 1], [2, 8], [3, 10],
[3, 11], [4, 0], [4, 5], [5, 4], [5, 6], [6, 5], [6, 7], [7, 0], [7, 6]])
g = ig.Graph()
g.add_vertices(12)
g.add_edges(arr)
i = g.clusters()
print(list(i))
#output: [[0, 4, 5, 6, 7], [1, 2, 8, 9], [3, 10, 11]]
igraph
also supports plotting these connected components like it's done in networkx
but you might need to download pycairo
from unofficial binaries and install it in order to unlock igraph.plot
option:
pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #passing a number of colors
color = pal.get_many(i.membership) # a list of color codes for each vertex
ig.plot(g, bbox = (200, 100), vertex_label=g.vs.indices,
vertex_color = color, vertex_size = 12, vertex_label_size = 8)
Notice that igraph
throws an InternalError
if initial array is used instead of trivial one. That's because every vertex should be declared before adding edges and all the vertices are not allowed to have missing numbers (in fact, it's allowed but reindexation is done silently and old names can be accessed using 'name' attribute). This issue can be fixed writting a custom function that creates a graph from relabelled edges:
def create_from_edges(edgelist):
g = ig.Graph()
u, inv = np.unique(edgelist, return_inverse=True)
e = inv.reshape(edgelist.shape)
g.add_vertices(u) #add vertices, not reindexed
g.add_edges(e) #add edges, reindexed
return g
arr = np.array([[0, 4], [0, 7], [1, 2], [1, 13], [2, 1], [2, 9], [3, 14],
[3, 16], [4, 0], [4, 5], [5, 4], [5, 6], [6, 5], [6, 7], [7, 0], [7, 6]])
g = create_from_edges(arr)
i = g.clusters()
print(list(i))
#output: [[0, 4, 5, 6, 7], [1, 2, 8, 9], [3, 10, 11]]
New labels were used in output (thus making it incorrect) but it's still possible to access old ones like so:
print('new_names:', g.vs.indices)
print('old_names:', g.vs['name'])
>>> new_names: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> old_names: [0, 1, 2, 3, 4, 5, 6, 7, 9, 13, 14, 16]
They can be used to preview original graph (vertex_label
is different now):
pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #passing a number of colors
color = pal.get_many(i.membership) ##a list of color codes for each vertex
ig.plot(g, bbox = (200, 100), vertex_label=g.vs['name'],
vertex_color = color, vertex_size = 12, vertex_label_size = 8)
Finally, you need to use old names of vertices in order to fix output. It can be done like so:
output = list(i)
old_names = np.array(g.vs['name'])
fixed_output = [old_names[n].tolist() for n in output]
#new output: [[0, 4, 5, 6, 7], [1, 2, 9, 13], [3, 14, 16]]
Upvotes: 1
Reputation: 12397
If you can use libraries, assuming your array is a
(Note that you cannot have components as numpy array since they can be non-rectangular array which does not exist in numpy, so this outputs them as sets):
import networkx as nx
G=nx.Graph()
G.add_edges_from(a)
print(sorted(nx.connected_components(G), key = len, reverse=True))
#[{0, 4, 5, 6, 7}, {1, 2, 13, 9}, {16, 3, 14}]
And if you need a pure numpyic solution without extra libraries, please check out this generalized solution: https://stackoverflow.com/a/61764414/4975981
Upvotes: 2
Reputation: 715
I am sure there are faster ways and I do not study graph theory but you can start with this;
x = [[ 0, 4],
[ 0, 7],
[ 1, 2],
[ 1, 13],
[ 2, 1],
[ 2, 9],
[ 3, 14],
[ 3, 16],
[ 4, 0],
[ 4, 5],
[ 5, 4],
[ 5, 6],
[ 6, 5],
[ 6, 7],
[ 7, 0],
[ 7, 6]]
nodes = list(set([item for sublist in x for item in sublist]))
grps = {n: g for n, g in zip(nodes, range(len(nodes)))}
for v in x:
t = grps[v[0]]
f = grps[v[1]]
if t != f:
for n in grps:
if grps[n] == f:
grps[n] = t
ret = [[k for k, v in grps.items() if v==g] for g in set(grps.values())]
print(ret)
Upvotes: 1