metacharacters with slash inside brackets

Question

From https://docs.python.org/2/howto/regex.html, I learned that the backslash is not necessary:

Metacharacters are not active inside classes. For example, [akm$] will match any of the characters 'a', 'k', 'm', or '$'; '$' is usually a metacharacter, but inside a character class it’s stripped of its special nature.

But `

(The predefined sets of characters) can be included inside a character class. For example, [\s,.] is a character class that will match any whitespace character, or ',' or '.'.

So I wonder how to understand the above two distinct statements, which seems to give contrary advice on when \ will work inside []? Thanks.

Martijn Pieters · Accepted Answer

\ will only work inside a character class if it defines a predefined character set, unless you escape it for the meta character that it is by doubling it. Duh.

But yes, the first statement glosses over this a little too easily.

Technically speaking, \s, \w, etc. are not meta-characters. They are pre-defined character class sets, so the definition still holds. Neither is the backslash; it defines the start of an escape sequence instead. The proper way to escape an escape sequence, even in character classes, is to double the backslash.

Note that any escape sequence that is not recognised results in the re pattern to contain one character, the ineffective backslash is ignored and only the next character is used. \C is not a known character class, so the pattern contains the character C at that point.

There are metacharacters that do consist of an escape sequence, such as \A, \Z, and \B; these are just regular A, Z, and B characters when used in a character class. \b is special; in a character class it is a backspace character, just like in Python string literals (analogues to how , , , \a and \v are interpreted the same as in string literals).

Demo:

>>> import re
>>> re.findall(r'[\]', r'\ the backslash will match')
['\']
>>> re.findall(r'[\C]', r'\C is not a valid escape sequence, only Cs will match')
['C', 'C']
>>> re.findall(r'[\s]', r'No s will match, whitespace is matched instead')
[' ', ' ', ' ', ' ', ' ', ' ', ' ']

metacharacters with slash inside brackets

Answers (2)

Related Questions