Reputation: 819
Text
is a large string, patterns
is a list of short (fixed) strings.
My code is:
def BruteForcePatternMatching(text, patterns):
indices = []
for pattern in patterns:
for index in range(len(text) - 1):
slide = text[index : index + len(pattern) - 1]
if pattern == slide:
indices.append(index)
return indices
My question is if there is a 'pythonic' way to extract a 'slide' from text, of iterated size.
Upvotes: 0
Views: 259
Reputation: 33950
You're trying to find matches to a list, patterns
, inside your text.
If patterns
is just fixed strings, use string.find()
(although note that will only find the first occurrence - see below).
s = 'cat dog cow dog' # Note 'dog' occurs multiply
s.find('dog')
4
s.find('cow')
8
s.find('cat')
0
More generally, if patterns either has regexes, or has duplicate fixed strings, use re.findall/iter()
. See the many duplicate questions here for examples.
import re
pat = re.compile(r'(cat|dog|cow)')
pat.findall("The cat and cow sat on the dog's catalog of doggerel")
# ['cat', 'cow', 'dog', 'cat', 'dog']
And if you also need the indices where matches occur, use re.finditer()
as @khachik showed.
Note we combined all the patterns into one regex, then used one re.findall()
call to search for all matches of every pattern inside all of text.
Your current code is very inefficient: it's trying to test for string-equality by running a sliding-window over text, and has nested loops: one covers all N indices in text, and all P patterns; this will be order O(N*P), which is not scalable.
Upvotes: 2
Reputation: 28693
Assuming your patterns don't contain any special regex characters or are regexes themselves:
import re
indices = [(m.start(), m.end()) for m in re.finditer("|".join(patterns), text)]
Upvotes: 1