Josh English
Josh English

Reputation: 532

Searching string for different substrings

I have a string. I need to know if any of the following substrings appear in the string.

So, if I have:

thing_name = "VISA ASSESSMENTS"

I've been doing my searches with:

any((_ in thing_name for _ in ['ASSESSMENTS','KILOBYTE','INTERNATIONAL']))

I'm going through a long list of thing_name items, and I don't need to filter, exactly, just check for any number of substrings.

Is this the best way to do this? It feels wrong, but I can't think of a more efficient way to pull this off.

Upvotes: 2

Views: 178

Answers (2)

Subhasis Das
Subhasis Das

Reputation: 1677

You can try re.search to see if that is faster. Something along the lines of

import re
pattern = re.compile('|'.join(['ASSESSMENTS','KILOBYTE','INTERNATIONAL']))
isMatch = (pattern.search(thing_name) != None)

Upvotes: 1

user2694688
user2694688

Reputation:

If your list of substrings is small and the input is small, then using a for loop to do compares is fine.

Otherwise the fastest way I know to search a string for a (large) list of substrings is to construct a DAWG of the word list and then iterate through the input string, keeping a list of DAWG traversals and registering the substrings at each successful traverse.

Another way is to add all the substrings to a hashtable and then hash every possible substring (up to the length of the longest substring) as you traverse the input string.

It's been a while since I've worked in python, my memory of it is that it's slow to implement stuff in. To go the DAWG route, I would probably implement it as a native module and then use it from python (if possible). Otherwise, I'd do some speed checks to verify first but probably go the hashtable route since there are already high performance hashtables in python.

Upvotes: 0

Related Questions