Reputation: 3733
I'm trying to compute the intersection of 2 sets in Python and I'd like to consider values that are within a certain range (+/- 5) as being equal. So for example:
set1 = [22, 570, 233, 127, 92]
set2 = [897, 27, 673, 231, 45]
What I want is for:
len[x for x in set1 if x in set2] == 2
where the intersections are 22 & 27 as well as 233 & 231.
Is there an efficient way of doing this?
Upvotes: 1
Views: 154
Reputation: 155418
At the expense of memory, for smallish ranges (your example said +-5), you could just make larger set
s for the check.
from itertools import chain
set1 = {22, 570, 233, 127, 92}
set2 = {897, 27, 673, 231, 45}
# Make a frozenset of all values considered overlapping with set2
overlap = 5
overlaprange = range(-overlap, overlap+1)
match2 = frozenset(chain.from_iterable((x + i for i in overlaprange) for x in set2))
print(len(set1 & match2))
This will never count a single element more than once, and if the values in set2
are close together, doesn't store their overlap more than once (conversion to frozenset
for match2
uniquifies). If one of the set
s is being matched against several other set
s, you'd convert the reused set
to a "match set
", then use that over and over for each intersection check.
Upvotes: 0
Reputation: 76194
If by "efficient" you mean "short", sure.
>>> set1 = [22, 570, 233, 127, 92]
>>> set2 = [897, 27, 673, 231, 45]
>>> len([x for x in set1 if any(abs(x-y) <= 5 for y in set2)])
2
If by "efficient" you mean "with a reasonable big-O run time", that's not so easy. The above approach is O(N^2). You could get O(N * log(N)) if you sort both sets and iterate over them in parallel looking for approximate matches, but it's not entirely trivial to do.
[edit - removed O(N log N) code that worked for the OP's input but not for set1=[1,3,4] and set2=[2,10]]
Upvotes: 1