Dipole
Dipole

Reputation: 1910

Regex: How to get span or group of *pattern* which matches input string

Note - This question is similar to this and this but I was unable to resolve my problem based on those answers.

I have a list of patterns list_patterns and I want an efficient way to search for a match against an input_string, so I join all of the patterns together (will be much more efficient than looping through all of the patterns and checking for a match). However, I am not so much interested in the existence of the match as much as which pattern matches my input string. The below code illustrates what I want:

import re
input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']
joined_patterns = r'|'.join(list_patterns)

print(joined_patterns)
# OUT:  ^foobar \d+$|^foobar [a-z]+$|^foobar \d+ [a-z]+$

compiled_patterns = re.compile(joined_patterns)

print(compiled_patterns.search(input_string).span())
# OUT: (0,13)

# Desired method returns the third pattern (index 2)
print(compiled_patterns.search(input_string).pattern_group())
# OUT: 2

Upvotes: 1

Views: 113

Answers (2)

Jan
Jan

Reputation: 43169

You could encapsulate your logic in a small class:

import re

input_string = 'foobar 11 the'

class MatchPattern:
    list_patterns = [r'^foobar \d+$', r'^foobar [a-z]+$', r'^foobar \d+ [a-z]+$']
    joined_patterns = ''

    def __init__(self):
        joined = "|".join(rf"(?P<group_{idx}>{pattern})" for idx, pattern in enumerate(self.list_patterns))
        self.joined_patterns = re.compile(joined)

    def match(self, string):
        m = self.joined_patterns.search(string)
        if m:
            group = [name for name, value in m.groupdict().items() if value][0]
            _, idx = group.split("_")
            return (group, self.list_patterns[int(idx)])
        else:
            return (None)

mp = MatchPattern()
group = mp.match(input_string)
print(group)
# ('group_2', '^foobar \\d+ [a-z]+$')

Upvotes: 2

Tomalak
Tomalak

Reputation: 338248

Group the patterns, find which group is not empty.

import re

input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']

joined_patterns = '(' + r')|('.join(list_patterns) + ')'
compiled_patterns = re.compile(joined_patterns)

print(compiled_patterns)
# (^foobar \d+$)|(^foobar [a-z]+$)|(^foobar \d+ [a-z]+$)

match = compiled_patterns.match(input_string)

i = next(i for i, g in enumerate(match.groups()) if g is not None)
matching_pattern = list_patterns[i]

print(matching_pattern)
# ^foobar \d+ [a-z]+$

Upvotes: 2

Related Questions