Reputation: 1910
Note - This question is similar to this and this but I was unable to resolve my problem based on those answers.
I have a list of patterns list_patterns
and I want an efficient way to search for a match against an input_string
, so I join all of the patterns together (will be much more efficient than looping through all of the patterns and checking for a match). However, I am not so much interested in the existence of the match as much as which pattern matches my input string. The below code illustrates what I want:
import re
input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']
joined_patterns = r'|'.join(list_patterns)
print(joined_patterns)
# OUT: ^foobar \d+$|^foobar [a-z]+$|^foobar \d+ [a-z]+$
compiled_patterns = re.compile(joined_patterns)
print(compiled_patterns.search(input_string).span())
# OUT: (0,13)
# Desired method returns the third pattern (index 2)
print(compiled_patterns.search(input_string).pattern_group())
# OUT: 2
Upvotes: 1
Views: 113
Reputation: 43169
You could encapsulate your logic in a small class:
import re
input_string = 'foobar 11 the'
class MatchPattern:
list_patterns = [r'^foobar \d+$', r'^foobar [a-z]+$', r'^foobar \d+ [a-z]+$']
joined_patterns = ''
def __init__(self):
joined = "|".join(rf"(?P<group_{idx}>{pattern})" for idx, pattern in enumerate(self.list_patterns))
self.joined_patterns = re.compile(joined)
def match(self, string):
m = self.joined_patterns.search(string)
if m:
group = [name for name, value in m.groupdict().items() if value][0]
_, idx = group.split("_")
return (group, self.list_patterns[int(idx)])
else:
return (None)
mp = MatchPattern()
group = mp.match(input_string)
print(group)
# ('group_2', '^foobar \\d+ [a-z]+$')
Upvotes: 2
Reputation: 338248
Group the patterns, find which group is not empty.
import re
input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']
joined_patterns = '(' + r')|('.join(list_patterns) + ')'
compiled_patterns = re.compile(joined_patterns)
print(compiled_patterns)
# (^foobar \d+$)|(^foobar [a-z]+$)|(^foobar \d+ [a-z]+$)
match = compiled_patterns.match(input_string)
i = next(i for i, g in enumerate(match.groups()) if g is not None)
matching_pattern = list_patterns[i]
print(matching_pattern)
# ^foobar \d+ [a-z]+$
Upvotes: 2