Reputation: 587
Context:
When running a regex match in Perl, $1
, $2
can be used as references to captured regex references from the match, similarly in Python \g<0>
,\g<1>
can be used
Perl also has a $+
special reference which refers to the captured group with highest numerical value
My question:
Does Python have an equivalent of $+
?
I tried \g<+>
and tried looking in the documentation which only says:
There’s also a syntax for referring to named groups as defined by the
(?P<name>...)
syntax.\g<name>
will use the substring matched by the group namedname
, and\g<number>
uses the corresponding group number.\g<2>
is therefore equivalent to\2
, but isn’t ambiguous in a replacement string such as\g<2>0
. (\20
would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character'0'
.) The following substitutions are all equivalent, but use all three variations of the replacement string.
Upvotes: 3
Views: 110
Reputation: 66883
The method captures in the regex module provides the same functionality: it "returns a list of all the captures of a group." So get the last one
>>> import regex
>>> str = 'fza'
>>> m = regex.search(r'(a)|(f)', str)
>>> print(m.captures()[-1])
f
When the str
has a
before f
this code prints a
. This is the exact equivalent of Per's $+
. Here we do get all captures, not only the highest one, and other related methods. Follow the word "captures" in the linked docs.
Another option that fits the intended use, explained in a comment, is the branch reset group, (?|pattern)
. It is also available in the regex module.
>>> import regex
>>> m = regex.search(r'(?|(a)|(b))', 'zba')
>>> m.group(1)
'b'
In short, with the branch reset (?|(pA)|(pB)|(pC))
the whole pattern is one capture group (with three alternations), not three. So you always know which is the "last" capture as there is only one, which has the match. This can be used with named capture groups as well.
This feature adds far more power as the pattern in (?|...)
gets more complex. Find it in your favorite regex documentation. Here it is in regular-expressions.info, for example, and here are some Perl resources, in perlre and an article in The Effective Perler.
Upvotes: 3
Reputation: 385764
In most case, you'd just use one capture around the alternation.
(foo|bar|baz)
In more complex cases, you could filter out None
results.
import re
s = 'bar4'
m = re.search( r'foo([12])|bar([34])|baz([56])', s )
[ g for g in m.groups() if g is not None ] # ['4']
Upvotes: 2