Reputation: 110380
Let's say I have the following string:
/Volumes/01/LG_SICARIO_ES419SUB_16X9_240_2398_DIGITAL_FINAL.itt
How would I determine the fastest way to search this string for a match given input terms ['sicario', '419']
For example, the most basic version would be:
1) String contains:
s = '/Volumes/01/LG_SICARIO_ES419SUB_16X9_240_2398_DIGITAL_FINAL.itt'
terms = ['sicario', '419']
has_match = all([term.lower() in s.lower() for term in terms])
2) Regex
has_match = all([re.compile(term, re.I).search(s) for term in terms])
Other possible options would be:
What would be the time complexity of the various algorithms?
Here is what I got for timing regex
vs str() in
:
import timeit
# Case insensitive
setup = 's="/Volumes/01/LG_SICARIO_ES419SUB_16X9_240_2398_DIGITAL_FINAL.itt"; terms = ["sicario", "419"]'
print min(timeit.Timer('all([term in s.lower() for term in terms])', setup=setup).repeat(7, 1000))
0.00134181976318
# Case sensitive
setup = 's="/Volumes/01/LG_SICARIO_ES419SUB_16X9_240_2398_DIGITAL_FINAL.itt"; terms = ["sicario", "419"]'
print min(timeit.Timer('all([term in s for term in terms])', setup=setup).repeat(7, 1000))
0.000231027603149
# Regex case insensitive
setup = 'import re; s="/Volumes/01/LG_SICARIO_ES419SUB_16X9_240_2398_DIGITAL_FINAL.itt"; compiled_terms = [re.compile("sicario", re.I), re.compile("419", re.I)]'
print min(timeit.Timer('all([compiled_term.search(s) for compiled_term in compiled_terms])', setup=setup).repeat(7, 1000))
0.00111889839172
# Regex case sensitive
setup = 'import re; s="/Volumes/01/LG_SICARIO_ES419SUB_16X9_240_2398_DIGITAL_FINAL.itt"; compiled_terms = [re.compile("sicario"), re.compile("419")]'
print min(timeit.Timer('all([compiled_term.search(s) for compiled_term in compiled_terms])', setup=setup).repeat(7, 1000))
0.000588893890381
It's pretty close, though case-sensitive string searches perform about 2x better than regex (at least on this input data).
Upvotes: 2
Views: 382
Reputation: 17322
I think your basic version is the fastest (a combination of Boyer-Moore and Horspoo) available (sublinear search behaviour in good cases (O(n/m)), I will add small changes to your basic version:
def test():
lower_s = s .lower()
return all([term in lower_s for term in terms])
Upvotes: 1