Phonon
Phonon

Reputation: 12727

Is there a Python equivalent to the Perl "/x" modifier for regular expressions?

Perl makes it easy to construct readable regular expressions using the /x modifier. This modifier allows to write regular expression strings and ignore all whitespaces in these strings. In other words, logical parts of the regular expression can be separated by whitespace or even carriage returns, allowing great readability. In Python, the only way I see of doing this is to construct such regular expression string, remove whitespace from it in an intermediate step, and then use the resulting string for matching. Is there a more elegant way of doing this?

Upvotes: 10

Views: 1184

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1121834

Yes, by setting the re.X / re.VERBOSE flag:

This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class, or when preceded by an unescaped backslash, or within tokens like *?, (?: or (?P<...>. When a line contains a # that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.

That means that the two following regular expression objects that match a decimal number are functionally equal:

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)
b = re.compile(r"\d+\.\d*")

This is pretty much exactly like the /x Perl flag.

You can control the same flag in a subsection of your pattern within the (?x:...) (enable) and (?-x:...) (disable) groupings.

Upvotes: 11

hwnd
hwnd

Reputation: 70732

To add, inline modifiers can be placed within a regular expression to enforce relevant matching behavior on the given expression. In Python the inline modifiers apply to the entire regular expression, and do not support inline negate modifiers such as (?-ismx)

pattern = re.compile(r'''
                       (?x) 
                        \d+ (?# Some numbers)
                        \s+ (?# Whitespace)
                        \d+ (?# More numbers)
                      ''');

The way around that would be to import Python's regex module in which the inline modifiers apply to the end of the group or pattern, and they can be turned on or off.

import regex
pattern = regex.compile(r'(?x)  \d+  (?-x)[a-z]+(?x)   \d+', regex.V1)

Upvotes: 3

Related Questions