James Adams
James Adams

Reputation: 8727

Python: How to get items that appear in only one set of a list of sets?

I want to create a function that takes a list of one or more sets and finds the symmetric difference of all of the sets in the list, i.e. the result should be a set of values, each of which is contained in only one of the individual sets. (Please correct me if I'm wrong about this being the symmetrical difference.)

For example:

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 3, 7])
>>> s4 = set([2, 5, 9])
>>> myfunc([s1, s2, s3, s4])
{1, 4, 5, 7, 9}

Is there something built in that could be used above in place of myfunc? Or do I use something like this:

def myfunc(sets: List[set]) -> set:

    sd = set()
    goners = set()
    for s in sets:
        still_ok = s - goners
        sd = sd.symmetric_difference(still_ok)
        goners = goners.union(s.difference(sd))
    return sd

Is there a better/more efficient/"Pythonic" way to do this?

Upvotes: 3

Views: 3313

Answers (5)

yukashima huksay
yukashima huksay

Reputation: 6238

What about this:

from collections import Counter

s1 = set([1, 2, 3])
s2 = set([2, 3, 4])
s3 = set([2, 3, 7])
s4 = set([2, 5, 9])
print([k for k,v in Counter((*s1,*s2,*s3,*s4)).items() if v == 1])

Eventho this looks nice since it's a oneliner you have to keep in mind that it's a little slower than your own approach:

In [85]: def nicefunc(sets): 
    ...:     return [k for k,v in Counter(itertools.chain.from_iterable(sets)).items() if v == 1] 
    ...:                                                                                                                                                                                       

In [86]: def nicefunc2(sets): 
    ...:     return [k for k,v in Counter( [i for s in sets for i in s]).items() if v == 1] 
    ...:                                                                                                                                                                                       

In [87]: def nicefunc3(): 
    ...:     return [k for k,v in Counter((*s1,*s2,*s3,*s4)).items() if v == 1] 
    ...:                                                                                                                                                                                       

In [88]: def myfunc(sets): 
    ...:     sd = set() 
    ...:     goners = set() 
    ...:     for s in sets: 
    ...:         still_ok = s - goners 
    ...:         sd = sd.symmetric_difference(still_ok) 
    ...:         goners = goners.union(s.difference(sd)) 
    ...:     return sd 
    ...:                                                                                                                                                                                       

In [89]: sets = [s1, s2, s3, s4]                                                                                                                                                               

In [90]: %timeit myfunc(sets)                                                                                                                                                                  
2.25 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [91]: %timeit nicefunc(sets)                                                                                                                                                                
3.64 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [92]: %timeit nicefunc2(sets)                                                                                                                                                               
3.79 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [94]: %timeit nicefunc3()                                                                                                                                                                   
3.64 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You could also pick another approach which is still a oneliner but faster:

In [152]: def coolfunc(sets): 
     ...:     return set.union(*[sets[i]-set.union(*sets[:i],*sets[i+1:]) for i in range(len(sets))]) 

In [153]: coolfunc(sets)                                                                                                                                                                       
Out[153]: {1, 4, 5, 7, 9}

In [154]: %timeit coolfunc(sets)                                                                                                                                                               
3.34 µs ± 19.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

However, as pointed out by @VBrail you have gotten the definition of symmetric set difference of a collection of sets wrong. Here is a one-liner for calculating the actual symmetric set difference of a collection which is defined as

the symmetric difference of a collection of sets contains just elements which are in an odd number of the sets in the collection wikipedia

from functools import reduce                                                                                                                                                          
s1 = set([1, 2, 3]) 
s2 = set([2, 3, 4]) 
s3 = set([2, 3, 7]) 
s4 = set([2, 5, 9])                                                                                                                                                                   
sets = [s1,s2,s3,s4]                                                                                                                                                                  
reduce(set.symmetric_difference, sets)      

{1, 3, 4, 5, 7, 9}

Upvotes: 3

GZ0
GZ0

Reputation: 4263

For operations on built-in Python objects that can be done using both operators and functions, the operator versions are generally faster than the function versions since there is overhead in accessing instance attributes and making explicit function calls. Also, performing in-place updates on collections can avoid creating extra copies of data and makes the program more efficient.

An improved version of your approach using set operators looks like this:

def myfunc_improved(sets: List[set]) -> set:
    sd = set()
    goners = set()
    for s in sets:
        sd ^= s - goners
        goners |= s - sd
    return sd

Performance measurements:

%timeit myfunc(sets)
%timeit myfunc_improved(sets)

3.19 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.75 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Upvotes: 3

neutrino_logic
neutrino_logic

Reputation: 1299

The itertools module is kind of useful for things like this:

import itertools as it

def only_exists_in_one_set(target):
    remover = []
    case = it.combinations(target, 2) #generate all combinations ignores order
    while True:
        try:
            temp = next(case)
            # AND all combos to find duplicates
            remover.append(temp[0] & temp[1])
        except StopIteration:
            break
    #flatten the nested list of sets passed to the function:
    target = [x for each_set in target for x in each_set]
    #flatten remover, eliminate duplicates with set
    for val in set([x for each_set in remover for x in each_set]):
        target = [a for a in target if a != val]    #remove all duplicate values
    return sorted(target)

>>> only_exists_in_one_set([{1,2,3},{2,3,4},{2,3,7},{2,5,9}])

>>> [1, 4, 5, 7, 9]

Not as concise as many approachs, but perhaps readable?

Upvotes: 1

André Müller
André Müller

Reputation: 117

You want a set B containing all members which are exclusively contained in one of your sets in A. What about the following (Python 3)?

from functools import reduce
A = [set([1, 2, 3]), set([2, 3, 4]), set([2, 3, 7]), set([2, 5, 9])]
B = set()
for i in range(len(A)):
    U = reduce(set.union, A[:i]+A[(i+1):])
    B = B.union(set.difference(A[i], U))

print(B)

{1, 4, 5, 7, 9}

Upvotes: 3

vBrail
vBrail

Reputation: 191

first yes your observation is wrong symmetric_difference of multiple sets is not the set of elements which occurs only in the individual set instead it is a set of elements whose total count in all set is odd, Hence the symmetric_difference of(s1,s2,s3,s4) will be {1, 3, 4, 5, 7, 9}.

def s_diff(li):
    res=set()
    for s in li:
        res =res.symmetric_difference(s)
    return res


output:
s_diff([s1,s2,s3,s4])
{1, 3, 4, 5, 7, 9}

Upvotes: 3

Related Questions