Reputation: 12727
Perl makes it easy to construct readable regular expressions using the /x
modifier. This modifier allows to write regular expression strings and ignore all whitespaces in these strings. In other words, logical parts of the regular expression can be separated by whitespace or even carriage returns, allowing great readability. In Python, the only way I see of doing this is to construct such regular expression string, remove whitespace from it in an intermediate step, and then use the resulting string for matching. Is there a more elegant way of doing this?
Upvotes: 10
Views: 1184
Reputation: 1121834
Yes, by setting the re.X
/ re.VERBOSE
flag:
This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class, or when preceded by an unescaped backslash, or within tokens like
*?
,(?:
or(?P<...>
. When a line contains a#
that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such#
through the end of the line are ignored.That means that the two following regular expression objects that match a decimal number are functionally equal:
a = re.compile(r"""\d + # the integral part \. # the decimal point \d * # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*")
This is pretty much exactly like the /x
Perl flag.
You can control the same flag in a subsection of your pattern within the (?x:...)
(enable) and (?-x:...)
(disable) groupings.
Upvotes: 11
Reputation: 70732
To add, inline modifiers can be placed within a regular expression to enforce relevant matching behavior on the given expression. In Python the inline modifiers apply to the entire regular expression, and do not support inline negate modifiers such as (?-ismx)
pattern = re.compile(r'''
(?x)
\d+ (?# Some numbers)
\s+ (?# Whitespace)
\d+ (?# More numbers)
''');
The way around that would be to import Python's regex module in which the inline modifiers apply to the end of the group or pattern, and they can be turned on or off.
import regex
pattern = regex.compile(r'(?x) \d+ (?-x)[a-z]+(?x) \d+', regex.V1)
Upvotes: 3