Mohammad Yusuf
Mohammad Yusuf

Reputation: 17074

Match literal string '\$'

I'm trying to match literal string '\$'. I'm escaping both '\' and '$' by backslash. Why isn't working when I escape the backslash in the pattern? But if I use a dot then it works.

import re

print re.match('\$','\$')
print re.match('\\\$','\$')
print re.match('.\$','\$')

Output:

None
None
<_sre.SRE_Match object at 0x7fb89cef7b90>

Can someone explain what's happening internally?

Upvotes: 3

Views: 3943

Answers (6)

Wasi Ahmad
Wasi Ahmad

Reputation: 37761

You should use the re.escape() function for this:

escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

For example:

import re
val = re.escape('\$') # val = '\\\$'
print re.match(val,'\$')

It outputs:

<_sre.SRE_Match object; span=(0, 2), match='\\$'>

This is equivalent to what @TigerhawkT3 mentioned in his answer.

Upvotes: 7

Mohammad Yusuf
Mohammad Yusuf

Reputation: 17074

Thanks for the above answers. I am adding this answer because we don't have a short summary in the above answers.

The backslash \ needs to be escaped both in python string and regex engine.

Python string will translate 2 \\ to 1 \. And regex engine will require 2 \\ to match 1 \

So to provide the regex engine with 2 \\ in order to match 1 \ we will have to use 4 \\\\ in python string.

\\\\ --> Python(string translation) ---> \\ ---> Regex Engine(translation) ---> \

Upvotes: 0

Waxrat
Waxrat

Reputation: 2185

In a (non-raw) string literal, backslash is special. It means the Python interpreter should handle following character specially. For example "\n" is a string of length 1 containing the newline character. "\$" is a string of a single character, the dollar sign. "\\$" is a string of two characters: a backslash and a dollar sign.

In regular expressions, the backslash also means the following character is to be handled specially, but in general the special meaning is different. In a regular expression, $ matches the end of a line, and \$ matches a dollar sign, \\ matches a single backslash, and \\$ matches a backslash at the end of a line.

So, when you do re.match('\$',s) the Python interpreter reads '\$' to construct a string object $ (i.e., length 1) then passes that string object to re.match. With re.match('\\$',s) Python makes a string object \$ (length 2) and passes that string object to re.match.

To see what's actually being passed to re.match, just print it. For example:

pat = '\\$'
print "pat :" + pat + ":"
m = re.match(pat, s)

People usually use raw string literals to avoid the double-meaning of backslashes.

pat = r'\$' # same 2-character string as above

Upvotes: 1

tomc
tomc

Reputation: 1207

r'string'

is the raw string

try annotating your regex string

here are the same re's with and without raw annotation

print( re.match(r'\\\$', '\$'))
<_sre.SRE_Match object; span=(0, 2), match='\\$'>


print( re.match('\\\$', '\$'))
None

this is python3 on account of because

Upvotes: 1

TigerhawkT3
TigerhawkT3

Reputation: 49330

Unfortunately, you need more backslashes. You need to escape them to indicate that they're literals in the string and get them into the expression, and then further escape them to indicate that they're literals instead of regex special characters. This is why raw strings are often used for regular expressions: the backslashes don't explode.

>>> import re
>>> print re.match('\$','\$')
None
>>> print re.match('\\\$','\$')
None
>>> print re.match('.\$','\$')
<_sre.SRE_Match object at 0x01E1F800>
>>> print re.match('\\\\\$','\$')
<_sre.SRE_Match object at 0x01E1F800>
>>> print re.match(r'\\\$','\$')
<_sre.SRE_Match object at 0x01E1F800>

Upvotes: 3

Aashutosh jha
Aashutosh jha

Reputation: 628

You have to use . as . matches any characters except newline.

Upvotes: -1

Related Questions