brian4342
brian4342

Reputation: 1253

Python intersection with substrings

I have two sets:

a = set(['this', 'is', 'an', 'apple!'])
b = set(['apple', 'orange'])

I want to find if there are any (b) in (a) including substrings. normally I would do:

c = a.intersection(b)

However, in this example it would return an empty set as 'apple' != 'apple!'

Assuming I cannot remove characters from (a) and hopefully without creating loops, is there a way for me to find a match?

Edit: I would like for it to return a match from (b) e.g. I would like to know if 'apple' is in set (a), I do not want it to return 'apple!'

Upvotes: 6

Views: 2417

Answers (3)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Using sets is actually of little benefit if you are not searching for exact matches, if the words always start with the same substring, sorting and bisecting will be a more efficient approach i.e O(n log n) vs O(n^2):

a = set(['this', 'is', 'an', 'apple!'])
b = set(['apple', 'orange'])

srt = sorted(a)
from bisect import bisect

inter = [word for word in b if srt[bisect(srt, word, hi=len(a))].startswith(word)]

Upvotes: 1

Ozgur Vatansever
Ozgur Vatansever

Reputation: 52163

Instead of doing the equality check via ==, you can use in for substring match which also covers equality:

>>> [x for ele in a for x in b if x in ele]
["apple"]

Upvotes: 7

Alex Hall
Alex Hall

Reputation: 36033

The best thing to do is:

any(x in y for x in b for y in a)

It's a loop, but you can't escape that. Any solution will at least have an implied loop somewhere.

Upvotes: 0

Related Questions