bs15ansj
bs15ansj

Reputation: 31

Find rows which share common elements

I have a 2D numpy array in which each row contains 2 integers. I want to find all the groups of elements belonging to rows which share common elements (a.k.a connected components of a graph from the edgelist array). For example, for the array:

[[ 0  4]
 [ 0  7]
 [ 1  2]
 [ 1 13]
 [ 2  1]
 [ 2  9]
 [ 3 14]
 [ 3 16]
 [ 4  0]
 [ 4  5]
 [ 5  4]
 [ 5  6]
 [ 6  5]
 [ 6  7]
 [ 7  0]
 [ 7  6]]

would contain the groups

[[ 0  4  5  6  7]
 [ 1  2 13  9]
 [ 3 14 16]]

Upvotes: 3

Views: 270

Answers (3)

mathfux
mathfux

Reputation: 5939

In terms of graph theory you need to create a graph from an array of edges and then find connected components of this graph. Pure numpy based solution seems too hard for me but you still can make it C level using igraph which is written in C (unlike networkx which is pure Python). You need to install python-igraph first.

Trivial case

igraph.Graph.clusters() method returns a special instance of igraph.clustering.VertexClustering class which can be converted to list:

import igraph
arr = np.array([[0, 4], [0, 7], [1, 2], [1, 9], [2, 1], [2, 8], [3, 10], 
                [3, 11], [4, 0], [4, 5], [5, 4], [5, 6], [6, 5], [6, 7], [7, 0], [7, 6]])
g = ig.Graph()
g.add_vertices(12)
g.add_edges(arr)
i = g.clusters() 
print(list(i))
#output: [[0, 4, 5, 6, 7], [1, 2, 8, 9], [3, 10, 11]]

igraph also supports plotting these connected components like it's done in networkx but you might need to download pycairo from unofficial binaries and install it in order to unlock igraph.plot option:

pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #passing a number of colors 
color = pal.get_many(i.membership) # a list of color codes for each vertex
ig.plot(g,  bbox = (200, 100), vertex_label=g.vs.indices,
        vertex_color = color, vertex_size = 12, vertex_label_size = 8)

enter image description here

General case

Notice that igraph throws an InternalError if initial array is used instead of trivial one. That's because every vertex should be declared before adding edges and all the vertices are not allowed to have missing numbers (in fact, it's allowed but reindexation is done silently and old names can be accessed using 'name' attribute). This issue can be fixed writting a custom function that creates a graph from relabelled edges:

def create_from_edges(edgelist):
    g = ig.Graph()
    u, inv = np.unique(edgelist, return_inverse=True)
    e = inv.reshape(edgelist.shape)
    g.add_vertices(u) #add vertices, not reindexed
    g.add_edges(e) #add edges, reindexed
    return g

arr = np.array([[0, 4], [0, 7], [1, 2], [1, 13], [2, 1], [2, 9], [3, 14], 
                [3, 16], [4, 0], [4, 5], [5, 4], [5, 6], [6, 5], [6, 7], [7, 0], [7, 6]])
g = create_from_edges(arr)
i = g.clusters()
print(list(i))
#output: [[0, 4, 5, 6, 7], [1, 2, 8, 9], [3, 10, 11]]

New labels were used in output (thus making it incorrect) but it's still possible to access old ones like so:

print('new_names:', g.vs.indices)
print('old_names:', g.vs['name'])
>>> new_names: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> old_names: [0, 1, 2, 3, 4, 5, 6, 7, 9, 13, 14, 16]

They can be used to preview original graph (vertex_label is different now):

pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #passing a number of colors 
color = pal.get_many(i.membership) ##a list of color codes for each vertex
ig.plot(g,  bbox = (200, 100), vertex_label=g.vs['name'], 
        vertex_color = color, vertex_size = 12, vertex_label_size = 8)

enter image description here

Finally, you need to use old names of vertices in order to fix output. It can be done like so:

output = list(i)
old_names = np.array(g.vs['name'])
fixed_output = [old_names[n].tolist() for n in output]
#new output: [[0, 4, 5, 6, 7], [1, 2, 9, 13], [3, 14, 16]]

Upvotes: 1

Ehsan
Ehsan

Reputation: 12397

If you can use libraries, assuming your array is a (Note that you cannot have components as numpy array since they can be non-rectangular array which does not exist in numpy, so this outputs them as sets):

import networkx as nx
G=nx.Graph()
G.add_edges_from(a)
print(sorted(nx.connected_components(G), key = len, reverse=True))
#[{0, 4, 5, 6, 7}, {1, 2, 13, 9}, {16, 3, 14}]

And if you need a pure numpyic solution without extra libraries, please check out this generalized solution: https://stackoverflow.com/a/61764414/4975981

Upvotes: 2

Helios
Helios

Reputation: 715

I am sure there are faster ways and I do not study graph theory but you can start with this;

x = [[ 0,  4],
 [ 0,  7],
 [ 1,  2],
 [ 1, 13],
 [ 2,  1],
 [ 2,  9],
 [ 3, 14],
 [ 3, 16],
 [ 4,  0],
 [ 4,  5],
 [ 5,  4],
 [ 5,  6],
 [ 6,  5],
 [ 6,  7],
 [ 7,  0],
 [ 7,  6]]

nodes = list(set([item for sublist in x for item in sublist]))
grps = {n: g for n, g in zip(nodes, range(len(nodes)))}

for v in x:
    t = grps[v[0]]
    f = grps[v[1]]
    if t != f:
        for n in grps:
            if grps[n] == f:
                grps[n] = t

ret = [[k for k, v in grps.items() if v==g] for g in set(grps.values())]
print(ret)

Upvotes: 1

Related Questions