Reputation: 3924
The answer to Javascript regex question Return the part of the regex that matched is "No, because compilation destroys the relationship between the regex text and the matching logic."
But Python preserves Match Objects, and re.groups()
returns the specific group(s) that triggered a match. It should be simple to preserve the regex text of each group as part of a Match Object and return it, but there doesn't appear to be a call to do so.
import re
pat = "(^\d+$)|(^\w+$)|(^\W+$)"
test = ['a', 'c3', '36d', '51', '29.5', '#$%&']
for t in test:
m = re.search(pat, t)
s = (m.lastindex, m.groups()) if m else ''
print(str(bool(m)), s)
This returns:
True (2, (None, 'a', None))
True (2, (None, 'c3', None))
True (1, ('51', None, None))
False
True (3, (None, None, '#$%&'))
The compiler obviously knows that there are three groups in this pattern. Is there a way to extract the subpattern in each group in a regex, with something like:
>>> print(m.regex_group_text)
('^\d+$', '^\w+$', '^\W+$')
Yes, it would be possible to write a custom pattern parser, for example to split on '|' for this particular pattern. But it would be far easier and more reliable to use the re compiler's understanding of the text in each group.
Upvotes: 5
Views: 136
Reputation: 5210
If the indices are not sufficient and you absolutely need to know the exact part of the regex, there is probably no other possibility but to parse the expression's groups on your own.
All in all, this is no big deal, since you can simply count opening and closing brackets and log their indices:
def locateBraces(inp):
bracePositions = []
braceStack = []
depth = 0
for i in range(len(inp)):
if inp[i] == '(':
braceStack.append(i)
depth += 1
if inp[i] == ')':
bracePositions.append((braceStack.pop(), i))
depth -= 1
if depth < 0:
raise SyntaxError('Too many closing braces.')
if depth != 0:
raise SyntaxError('Too many opening braces.')
return bracePositions
Edited: This dumb implementation only counts opening and closing braces. However, regexes may contain escaped braces, e.g.
\(
, which are counted as regular group-defining braces using this method. You may want to adapt it to omit braces that have an uneven number of backslashes right before them. I leave this issue as a task for you ;)
With this function, your example becomes:
pat = "(^\d+$)|(^\w+$)|(^\W+$)"
bloc = locateBraces(pat)
test = ['a', 'c3', '36d', '51', '29.5', '#$%&']
for t in test:
m = re.search(pat, t)
print(str(bool(m)), end='')
if m:
h = bloc[m.lastindex - 1]
print(' %s' % (pat[h[0]:h[1] + 1]))
else:
print()
Which returns:
True (^\w+$)
True (^\w+$)
True (^\w+$)
True (^\d+$)
False
True (^\W+$)
Edited: To get the list of your groups, of course a simple comprehension would do:
gtxt = [pat[b[0]:b[1] + 1] for b in bloc]
Upvotes: 5
Reputation: 1991
It will remain up to you to track what regular expressions you are feeding into re.search
. Something like:
import re
patts = {
'a': '\d+',
'b': '^\w+',
'c': '\W+'
}
pat = '^' + '|'.join('({})'.format(x) for x in patts.values()) + '$'
test = ['a', 'c3', '36d', '51', '29.5', '#$%&']
for t in test:
m = re.search(pat, t)
if m:
for g in m.groups():
for key, regex in patts.iteritems():
if g and re.search(regex, g):
print "t={} matched regex={} ({})".format(t, key, regex)
break
Upvotes: 2
Reputation: 309841
This may or may not be helpful depending on the problem that you are actually trying to solve ... But python lets you name the groups:
r = re.compile('(?P<int>^\d+$)|(?P<word>^\w+$)')
From there, when you have a match, you can inspect the groupdict
to see which groups are present:
r.match('foo').groupdict() # {'int': None, 'word': 'foo'}
r.match('10').groupdict() # {'int': '10', 'word': None}
Of course, this doesn't tell you the exact regular expression associated with the match -- You'd need to keep track of that yourself based on the group name.
If you really want to go beyond this, you probably want something a little more sophisticated than simple regular expression parsing. In that case, I might suggest something like pyparsing
. Don't let the old-school styling on the website fool you (or the lack of a PEP-8 compliant API) -- the library is actually quite powerful once you get used to it.
Upvotes: 4