Reputation: 532
I have a string. I need to know if any of the following substrings appear in the string.
So, if I have:
thing_name = "VISA ASSESSMENTS"
I've been doing my searches with:
any((_ in thing_name for _ in ['ASSESSMENTS','KILOBYTE','INTERNATIONAL']))
I'm going through a long list of thing_name items, and I don't need to filter, exactly, just check for any number of substrings.
Is this the best way to do this? It feels wrong, but I can't think of a more efficient way to pull this off.
Upvotes: 2
Views: 178
Reputation: 1677
You can try re.search
to see if that is faster. Something along the lines of
import re
pattern = re.compile('|'.join(['ASSESSMENTS','KILOBYTE','INTERNATIONAL']))
isMatch = (pattern.search(thing_name) != None)
Upvotes: 1
Reputation:
If your list of substrings is small and the input is small, then using a for loop to do compares is fine.
Otherwise the fastest way I know to search a string for a (large) list of substrings is to construct a DAWG of the word list and then iterate through the input string, keeping a list of DAWG traversals and registering the substrings at each successful traverse.
Another way is to add all the substrings to a hashtable and then hash every possible substring (up to the length of the longest substring) as you traverse the input string.
It's been a while since I've worked in python, my memory of it is that it's slow to implement stuff in. To go the DAWG route, I would probably implement it as a native module and then use it from python (if possible). Otherwise, I'd do some speed checks to verify first but probably go the hashtable route since there are already high performance hashtables in python.
Upvotes: 0