akash
akash

Reputation: 587

Does Python have a maximum group refer for regex (like Perl)?

Context:
When running a regex match in Perl, $1, $2 can be used as references to captured regex references from the match, similarly in Python \g<0>,\g<1> can be used

Perl also has a $+ special reference which refers to the captured group with highest numerical value

My question:
Does Python have an equivalent of $+ ?

I tried \g<+> and tried looking in the documentation which only says:

There’s also a syntax for referring to named groups as defined by the (?P<name>...) syntax. \g<name> will use the substring matched by the group named name, and \g<number> uses the corresponding group number. \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement string such as \g<2>0. (\20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'.) The following substitutions are all equivalent, but use all three variations of the replacement string.

Upvotes: 3

Views: 110

Answers (2)

zdim
zdim

Reputation: 66883

The method captures in the regex module provides the same functionality: it "returns a list of all the captures of a group." So get the last one

>>> import regex
>>> str = 'fza'
>>> m = regex.search(r'(a)|(f)', str)
>>> print(m.captures()[-1])
f

When the str has a before f this code prints a. This is the exact equivalent of Per's $+. Here we do get all captures, not only the highest one, and other related methods. Follow the word "captures" in the linked docs.


Another option that fits the intended use, explained in a comment, is the branch reset group, (?|pattern). It is also available in the regex module.

>>> import regex
>>> m = regex.search(r'(?|(a)|(b))', 'zba')
>>> m.group(1)
'b'

In short, with the branch reset (?|(pA)|(pB)|(pC)) the whole pattern is one capture group (with three alternations), not three. So you always know which is the "last" capture as there is only one, which has the match. This can be used with named capture groups as well.

This feature adds far more power as the pattern in (?|...) gets more complex. Find it in your favorite regex documentation. Here it is in regular-expressions.info, for example, and here are some Perl resources, in perlre and an article in The Effective Perler.

Upvotes: 3

ikegami
ikegami

Reputation: 385764

In most case, you'd just use one capture around the alternation.

(foo|bar|baz)

In more complex cases, you could filter out None results.

import re
s = 'bar4'
m = re.search( r'foo([12])|bar([34])|baz([56])', s )
[ g for g in m.groups() if g is not None ]   # ['4']

Upvotes: 2

Related Questions