Get a list of lists of the sets which intersect each other in python

Question

Given a list of sets I would like to get a list of the lists of the sets which intersect each other. Basically what I want is a list of lists s.t. for each list in the output all sets in that list have a non-empty intersection with at least another set in the same list.

I hope I was able to explain my problem. Hopefully the following example and the rest of the post should clarify it even more.

Given,

sets = [
    set([1,3]), # A
    set([2,3,5]), # B
    set([21,22]), # C
    set([1,9]), # D
    set([5]), # E
    set([18,21]), # F
]

My desired output is:

[
    [
        set([1,3]), # A, shares elements with B
        set([2,3,5]), # B, shares elements with A 
        set([1,9]), # D, shares elements with A
        set([5]), # E shares elements with B
    ],
    [
        set([21,22]), # C shares elements with F
        set([18,21]), # F shares elements with C
    ]
]

The order of the sets in the output does NOT matter.

I would like to achieve this goal with a very fast algorithm. Performance is my first requirement.

At the moment my solution creates a graph with as many nodes as sets in the original list. Then it creates an edge in this graph between the nodes that represents sets A and B iff these sets have a non empty intersection. Than it calculates the connected components of such a graph which gives me my expected result.

I am wondering if there is a faster way of doing this with an algorithm which does not involve graphs.

Best, Andrea

Abhijit · Accepted Answer

As @MartijnPieters rightly said, the problem calls for graphs, and networkx would be at your rescue.

Salient Points

Nodes of the graph should be sets
Edges between the graph exist iff the sets intersect
From the resultant graph, find all connected components

Implementation

def intersecting_sets(sets):
    import networkx as nx
    G = nx.Graph()
    # Nodes of the graph should be hashable
    sets = map(frozenset, sets)
    for to_node in sets:
        for from_node in sets:
            # off-course you don't want a self loop
            # and only interested in intersecting nodes 
            if to_node != from_node and to_node & from_node:
                G.add_edge(to_node, from_node)
    # and remember to convert the frozen sets to sets
    return [map(set, lst) for lst in nx.connected_components(G)]

Output

>>> intersecting_sets(sets)
[[set([2, 3, 5]), set([1, 3]), set([5]), set([1, 9])], [set([21, 22]), set([18, 21])]]
>>> pprint.pprint(intersecting_sets(sets))
[[set([2, 3, 5]), set([1, 3]), set([5]), set([1, 9])],
 [set([21, 22]), set([18, 21])]]

Get a list of lists of the sets which intersect each other in python

Answers (2)

Related Questions