qytm
qytm

Reputation: 43

How to find duplicated subsets of several sets?

I have several sets like following:

A={0, 2, 3, 4}

B={1, 2, 4}

C={0, 1, 5}

Then, I have found subsets of each set:

SubsetA=[{0}, {2}, {3}, {4}, {0, 2}, {0, 3}, {0, 4}, {2, 3}, {2, 4}, {3, 4}, {0, 2, 3}, {0, 2, 4}
         {0, 3, 4}, {2, 3, 4}, {0, 2, 3, 4}]

SubsetB=[{1}, {2}, {4}, {1, 2}, {1, 4}, {2, 4}, {1, 2, 4}]

SubsetC=[{0}, {1}, {5}, {0, 1}, {0, 5} {1, 5}, {0, 1, 5}]

What I would like to do is to find duplicated subsets then remove them from the final output. For instance, the output of the above example is 24 since there are 24 subsets left after duplicated ones are removed like following:

{0}, {2}, {3}, {4}, {0, 2}, {0, 3}, {0, 4}, {2, 3}, {2, 4}, {3, 4}, {0, 2, 3}, {0, 2, 4}
{0, 3, 4}, {2, 3, 4}, {0, 2, 3, 4}, {1}, {1, 2}, {1, 4}, {1, 2, 4}, {5}, {0, 1},
{0, 5}, {1, 5}, {0, 1, 5}

I am able to find all subsets by using python script (this finds subsets of only one set); however, I am not sure how to find duplicated subsets of several sets and remove them.

I do not need to generate these subsets. Instead, I need to find just the number 24 as in the above example.

Note: I have at most 4 sets with sizes range from 15 to 35 as following:

A={32, 33, 8, 9, 10, 11, 12, 13, 14, 15, 27, 28, 29, 30, 31}

B={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 34, 35, 36, 37, 38, 39, 40, 41}

C={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 35, 43}

D={34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48}

Is there any suggestion?

my subset code:

def subsetsUtil(A, subset, index): 
    print(*subset) 
    for i in range(index, len(A)):  

        # include the A[i] in subset.  
        subset.append(A[i]) 

        # move onto the next element.  
        subsetsUtil(A,subset, i + 1)  

        # exclude the A[i] from subset and  
        # triggers backtracking. 
        subset.pop(-1)  
    return

    # below function returns the subsets of vector A.  
def subsets(A): 
    global res 
    subset = [] 

    # keeps track of current element in vector A  
    index = 0
    subsetsUtil(A, subset, index)  

    # Driver Code 

    # find the subsets of below vector.  
    array = [0, 2, 3, 4]

    # res will store all subsets.  
    # O(2 ^ (number of elements inside array))  
    # because at every step we have two choices  
    # either include or ignore.  
    subsets(array)  

Upvotes: 2

Views: 649

Answers (4)

tobias_k
tobias_k

Reputation: 82889

You write that some of your sets have 35 elements, which would mean that they have ~34 billion subsets. Generating, merging and counting all those subsets is not really an option. If you just need the number of total subsets, and not the subsets themselves, you could calculate the number of subsets of each individual set as 2n-1 and combine those for all combination of 1, 2, 3, ... sets.

>>> sets = [{0, 2, 3, 4}, {1, 2, 4}, {0, 1, 5}, {1, 2, 3, 4, 5}]
>>> num_subsets = lambda s: 2**len(s) - 1
>>> [num_subsets(s) for s in sets]
[15, 7, 7, 31]
# "naive" solution for reference
>>> len(set(frozenset(c) for s in sets for i in range(1, len(s)+1)
...                      for c in combinations(s, i)))
...
42
# combining subset counts of combinations of sets
>>> sum((-1)**k * sum(num_subsets(reduce(set.intersection, s))
...                   for s in combinations(sets, k+1)) 
...     for k in range(len(sets)))
...
42

The idea is to count all the subsets of individual sets, and then getting the subsets of all pairwise intersections of sets (i.e. the duplicate subsets) and subtracting those from the first. Since those could again have duplicates if e.g. one or more element appears in three sets, we have to get the intersection of three sets next and add those to the result again, then subtract the subsets of intersections of four, and so on. This is done with the (-1)**i factor.

Of course, this pretty uselessly calculates all the combinations of N sets, even if there can not be any intersection between them (don't have to check the intersection A & B & X if A & B already is the empty set), so you could expand this to only go "deeper" if the current selection of sets still has common elements, but that's left as an exercise to the interested reader. (But if you have "at most 4 sets" anyway, then that's really not necessary.)


Note that while the "naive" approach (generating all the subsets and collecting them in a set) has its complexity in the "generating subsets" part, this algorithm has the complexity in the "finding intersections of combinations of sets" part. Thus, in your case, with few sets with many elements, this algorithm is much faster then the naive one, but if you had many sets with few elements each, the situation would be reversed and the naive would be much faster.

>>> gen_sets = lambda n, m: [{random.randint(0,2*n) for _ in range(n)}
...                          for _ in range(m)]
...    
>>> sets = gen_sets(15, 10) # few sets, many elements
>>> %timeit naive()
10 loops, best of 3: 168 ms per loop
>>> %timeit this()
100 loops, best of 3: 4.73 ms per loop

>>> sets = gen_sets(10, 15) # many sets, few elements
>>> %timeit naive()
100 loops, best of 3: 11.6 ms per loop
>>> %timeit this()
10 loops, best of 3: 162 ms per loop

If both the number of elements and number of sets is large(-ish), the above mentioned optimization might help, but even then it is trivial to construct cases where no intersections can be pruned, e.g. if all the sets share a common element.

Upvotes: 4

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

Here is a a possible solution:

from itertools import combinations

sets = [{0, 2, 3, 4}, {1, 2, 4}, {0, 1, 5}]
result = set(frozenset(c)
             for s in sets
             for i in range(len(s)+1)
             for c in combinations(s, i))

My solution includes also the empty set.

Upvotes: 2

AndrejH
AndrejH

Reputation: 2109

Here is an example. You just need to combine all the subsets you get into a single list, where a subset is represent with a tuple.

sets = [(6,5), (4,5), (5,6), (6,5)]

sets_sorted = list(map(lambda x: tuple(sorted(x)),sets))
unique_sets = list(set(sets_sorted))
# Unique sets: [(4, 5), (5, 6)]

Basically you sort the tuple if the same numbers appear in a different order in a subset and then you convert them back to tuple. This is needed in order for set to remove the duplicates, because a tuple is a hashable type. Tuples are then put into a set which removes the duplicates and then converted back to a list. Then you can access them or get their count by len(unique_sets).

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 148880

Just use... set if you want to find unique elements!

tot = set((frozenset(i) for i in SubsetA +SubsetB+SubsetC))
print(len(tot))

gives 23

I had to use frozenset because as a set is a mutable type it is not hashable and cannot be used in a set or as a key in a dict. If you want a list of plain sets, just use another comprehension:

print([set(i) for i in tot])

gives:

[{0, 2, 3}, {1, 2}, {1, 5}, {1, 2, 4}, {5}, {0, 1, 5}, {3}, {0}, {0, 5}, 
 {2, 3, 4}, {3, 4}, {2, 3}, {0, 2}, {2}, {0, 1}, {0, 4}, {0, 2, 3, 4}, {4}, 
 {0, 3}, {2, 4}, {0, 3, 4}, {1, 4}, {1}]

Upvotes: 2

Related Questions