Reputation: 3616
I understand that to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\\"
. Without raw string notation, one must use "\\\\"
.
When I saw the code string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string)
, I was wondering the meaning of a backslash in \'
and \`
, since it also works well as '
and `
, like string = re.sub(r"[^A-Za-z0-9(),!?'`]", " ", string)
. Is there any need to add the backslash here?
I tried some examples in Python:
str1 = "\'s"
print(str1)
str2 = "'s"
print(str2)
The result is same as 's
. I think this might be the reason why in previous code, they use \'\`
in string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string)
. I was wondering is there any difference between "\'s"
and "'s"
?
string = 'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
re.match(r"\\", string)
The re.match
returns nothing, which shows there is no backslash in the string. However, I do see backslashes in it. Is that the backslash in \'
actually not a backslash?
Upvotes: 1
Views: 2205
Reputation: 812
Check out https://docs.python.org/2.0/ref/strings.html for a better explanation.
The problem with your second example is that string
isn't a raw string, so the \'
is interpreted as '
. If you change it to:
>>> not_raw = 'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
>>> res1 = re.search(r'\\',not_raw)
>>> type(res1)
<type 'NoneType'>
>>> raw = r'adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .'
>>> res2 = re.search(r'\\',raw)
>>> type(res2)
<type '_sre.SRE_Match'>
For an explanation of re.match
vs re.search
: What is the difference between Python's re.search and re.match?
Upvotes: 1
Reputation: 337
In python, those are escaped characters, because they can also have other meanings to the code other than as they appear on-screen (for example, a string can be made by wrapping it in a single quote). You can see all of the python string literals here, but the reason there were no backslashes found in that string is that they are considered escaped single quotes. Although it's not necessary, it is still valid syntax because it sometimes is needed
Upvotes: 2