Reputation: 9064
I found the following regex substitution example from the documentation for Regex. I'm a little bit confused as to what the prefix r
does before the string?
re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
... r'static PyObject*\npy_\1(void)\n{',
... 'def myfunc():')
Upvotes: 29
Views: 33050
Reputation: 36630
Current re
module docs gives explanation regarding raw-string usage
Regular expressions use the backslash character (
'\'
) to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write'\\\\'
as the pattern string, because the regular expression must be\\
, and each backslash must be expressed as\\
inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate aDeprecationWarning
and in the future this will become aSyntaxError
. This behaviour will happen even if it is a valid escape sequence for a regular expression.The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with
'r'
. Sor"\n"
is a two-character string containing'\'
and'n'
, while"\n"
is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
Upvotes: 0
Reputation: 7870
The r means that the string is to be treated as a raw string, which means all escape codes will be ignored.
The Python document says this precisely:
String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences.
Upvotes: 7
Reputation:
Placing r
or R
before a string literal creates what is known as a raw-string literal. Raw strings do not process escape sequences (\n
, \b
, etc.) and are thus commonly used for Regex patterns, which often contain a lot of \
characters.
Below is a demonstration:
>>> print('\n') # Prints a newline character
>>> print(r'\n') # Escape sequence is not processed
\n
>>> print('\b') # Prints a backspace character
>>> print(r'\b') # Escape sequence is not processed
\b
>>>
The only other option would be to double every backslash:
re.sub('def\\s+([a-zA-Z_][a-zA-Z_0-9]*)\\s*\\(\\s*\\):',
... 'static PyObject*\\npy_\\1(void)\\n{',
... 'def myfunc():')
which is just tedious.
Upvotes: 40