Reputation: 8727
I want to create a function that takes a list of one or more sets and finds the symmetric difference of all of the sets in the list, i.e. the result should be a set of values, each of which is contained in only one of the individual sets. (Please correct me if I'm wrong about this being the symmetrical difference.)
For example:
>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 3, 7])
>>> s4 = set([2, 5, 9])
>>> myfunc([s1, s2, s3, s4])
{1, 4, 5, 7, 9}
Is there something built in that could be used above in place of myfunc
? Or do I use something like this:
def myfunc(sets: List[set]) -> set:
sd = set()
goners = set()
for s in sets:
still_ok = s - goners
sd = sd.symmetric_difference(still_ok)
goners = goners.union(s.difference(sd))
return sd
Is there a better/more efficient/"Pythonic" way to do this?
Upvotes: 3
Views: 3313
Reputation: 6238
What about this:
from collections import Counter
s1 = set([1, 2, 3])
s2 = set([2, 3, 4])
s3 = set([2, 3, 7])
s4 = set([2, 5, 9])
print([k for k,v in Counter((*s1,*s2,*s3,*s4)).items() if v == 1])
Eventho this looks nice since it's a oneliner you have to keep in mind that it's a little slower than your own approach:
In [85]: def nicefunc(sets):
...: return [k for k,v in Counter(itertools.chain.from_iterable(sets)).items() if v == 1]
...:
In [86]: def nicefunc2(sets):
...: return [k for k,v in Counter( [i for s in sets for i in s]).items() if v == 1]
...:
In [87]: def nicefunc3():
...: return [k for k,v in Counter((*s1,*s2,*s3,*s4)).items() if v == 1]
...:
In [88]: def myfunc(sets):
...: sd = set()
...: goners = set()
...: for s in sets:
...: still_ok = s - goners
...: sd = sd.symmetric_difference(still_ok)
...: goners = goners.union(s.difference(sd))
...: return sd
...:
In [89]: sets = [s1, s2, s3, s4]
In [90]: %timeit myfunc(sets)
2.25 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [91]: %timeit nicefunc(sets)
3.64 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [92]: %timeit nicefunc2(sets)
3.79 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [94]: %timeit nicefunc3()
3.64 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You could also pick another approach which is still a oneliner but faster:
In [152]: def coolfunc(sets):
...: return set.union(*[sets[i]-set.union(*sets[:i],*sets[i+1:]) for i in range(len(sets))])
In [153]: coolfunc(sets)
Out[153]: {1, 4, 5, 7, 9}
In [154]: %timeit coolfunc(sets)
3.34 µs ± 19.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
However, as pointed out by @VBrail you have gotten the definition of symmetric set difference of a collection of sets wrong. Here is a one-liner for calculating the actual symmetric set difference of a collection which is defined as
the symmetric difference of a collection of sets contains just elements which are in an odd number of the sets in the collection wikipedia
from functools import reduce
s1 = set([1, 2, 3])
s2 = set([2, 3, 4])
s3 = set([2, 3, 7])
s4 = set([2, 5, 9])
sets = [s1,s2,s3,s4]
reduce(set.symmetric_difference, sets)
{1, 3, 4, 5, 7, 9}
Upvotes: 3
Reputation: 4263
For operations on built-in Python objects that can be done using both operators and functions, the operator versions are generally faster than the function versions since there is overhead in accessing instance attributes and making explicit function calls. Also, performing in-place updates on collections can avoid creating extra copies of data and makes the program more efficient.
An improved version of your approach using set operators looks like this:
def myfunc_improved(sets: List[set]) -> set:
sd = set()
goners = set()
for s in sets:
sd ^= s - goners
goners |= s - sd
return sd
Performance measurements:
%timeit myfunc(sets)
%timeit myfunc_improved(sets)
3.19 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.75 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Upvotes: 3
Reputation: 1299
The itertools module is kind of useful for things like this:
import itertools as it
def only_exists_in_one_set(target):
remover = []
case = it.combinations(target, 2) #generate all combinations ignores order
while True:
try:
temp = next(case)
# AND all combos to find duplicates
remover.append(temp[0] & temp[1])
except StopIteration:
break
#flatten the nested list of sets passed to the function:
target = [x for each_set in target for x in each_set]
#flatten remover, eliminate duplicates with set
for val in set([x for each_set in remover for x in each_set]):
target = [a for a in target if a != val] #remove all duplicate values
return sorted(target)
>>> only_exists_in_one_set([{1,2,3},{2,3,4},{2,3,7},{2,5,9}])
>>> [1, 4, 5, 7, 9]
Not as concise as many approachs, but perhaps readable?
Upvotes: 1
Reputation: 117
You want a set B containing all members which are exclusively contained in one of your sets in A. What about the following (Python 3)?
from functools import reduce
A = [set([1, 2, 3]), set([2, 3, 4]), set([2, 3, 7]), set([2, 5, 9])]
B = set()
for i in range(len(A)):
U = reduce(set.union, A[:i]+A[(i+1):])
B = B.union(set.difference(A[i], U))
print(B)
{1, 4, 5, 7, 9}
Upvotes: 3
Reputation: 191
first yes your observation is wrong symmetric_difference of multiple sets is not the set of elements which occurs only in the individual set instead it is a set of elements whose total count in all set is odd, Hence the symmetric_difference of(s1,s2,s3,s4) will be {1, 3, 4, 5, 7, 9}.
def s_diff(li):
res=set()
for s in li:
res =res.symmetric_difference(s)
return res
output:
s_diff([s1,s2,s3,s4])
{1, 3, 4, 5, 7, 9}
Upvotes: 3