Reputation: 13
I learned that "r"\n"
is a two-character string containing '\'
and 'n'
, while "\n"
is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation." And the r"\n"
is equivalent to "\\n"
to denotes two-character string '\'
and 'n'
.
I test it by printing and it works
>>>print(r"\n") or print("\\n")
'\n'
However, when I tested in regex
>>>import re
>>>re.findall("\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30']
>>>re.findall(r"\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30'] # Still the same as before, seems 'r' doesn't work at all
>>>re.findall("\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['12', '10', '30'] # Doesn't work yet
When I tried this, it works though
>>>re.findall(r"\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\\d']
>>>re.findall("\\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\\d']
>>>re.findall("\\\\d+", '12 cats, 10 dogs, 30 rabits, \d is here')
['\\d'] # Even four backslashes
Why? Does this mean I have to add one more backslash when using regex to make sure it is a raw string?
Reference: https://docs.python.org/3/howto/regex.html
Upvotes: 0
Views: 829
Reputation: 370377
The reason that "\d+"
works is that "\d"
is not a proper escape sequence in Python strings and Python simply treats it as a backslash followed by a "d" instead of producing a syntax error.
So "\d"
, "\\d"
and r"\d"
are all equivalent and represent a string containing one backslash and one d. The regex engine than sees this backslash + "d" and interprets it as "match any digit".
"\\\d"
, "\\\\d"
and r"\\d"
, on the other hand, all contain two backslashes followed by a "d". This tells the regex engine to match a backslash followed by a "d".
Upvotes: 2