planetp
planetp

Reputation: 16123

When to use raw strings in regex patterns?

From the documentation on regular expression I understand that it's recommended to use "raw" strings for patterns to make sure backslashes are not handled in any special way:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'.

I wonder what other cases (apart from the literal backslash) may require using raw strings?

Upvotes: 1

Views: 2138

Answers (1)

Eugene Yarmash
Eugene Yarmash

Reputation: 150188

One another example is sequences like \1, \2 which are octal escapes in Python strings, but reference captured groups in regular expressions.

>>> re.search(r"(\w+) \1", "the the")
<_sre.SRE_Match object; span=(0, 7), match='the the'>
>>> re.search("(\w+) \1", "the the")
>>> 

Upvotes: 4

Related Questions