Xiong Lei
Xiong Lei

Reputation: 13

Why does Regex raw string prefix "r" not work as expected?

I learned that "r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation." And the r"\n" is equivalent to "\\n" to denotes two-character string '\' and 'n'.

I test it by printing and it works

>>>print(r"\n") or print("\\n")
'\n'

However, when I tested in regex

>>>import re
>>>re.findall("\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']
>>>re.findall(r"\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']  # Still the same as before, seems 'r' doesn't work at all
>>>re.findall("\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']  # Doesn't work yet

When I tried this, it works though

>>>re.findall(r"\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\\d']
>>>re.findall("\\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\\d']
>>>re.findall("\\\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\\d']  # Even four backslashes

Why? Does this mean I have to add one more backslash when using regex to make sure it is a raw string?

Reference: https://docs.python.org/3/howto/regex.html

Upvotes: 0

Views: 829

Answers (1)

sepp2k
sepp2k

Reputation: 370377

The reason that "\d+" works is that "\d" is not a proper escape sequence in Python strings and Python simply treats it as a backslash followed by a "d" instead of producing a syntax error.

So "\d", "\\d" and r"\d" are all equivalent and represent a string containing one backslash and one d. The regex engine than sees this backslash + "d" and interprets it as "match any digit".

"\\\d", "\\\\d" and r"\\d", on the other hand, all contain two backslashes followed by a "d". This tells the regex engine to match a backslash followed by a "d".

Upvotes: 2

Related Questions