tulians
tulians

Reputation: 448

Avoid escaping characters in regex

I'm working with the content of a POST body and want extract the values for each key. The data I'm trying to parse is:

s = b'----------------------------941135026682458398564529\r\nContent-Disposition: form-data; name="username"\r\n\r\nmyusername\r\n----------------------------941135026682458398564529\r\nContent-Disposition: form-data; name="password"\r\n\r\nmypassword\r\n----------------------------941135026682458398564529\r\nContent-Disposition: form-data; name="keyword"\r\n\r\nmykeyword\r\n----------------------------941135026682458398564529--\r\n'

What I want to get are the values myusername, mypassword and mykeyword by using Python's re module. For this reason I generated this pattern:

pattern = r'\bname=\"{}\"\\r\\n\\r\\n([^-]+)\\r'      

which is then modified as needed to match each of the keys:

username_pattern = re.compile(pattern.format("username"))                                      
password_pattern = re.compile(pattern.format("password"))                      
keyword_pattern = re.compile(pattern.format("keyword")) 

The problem I'm facing is that all the backslashes are getting escaped, so when I define pattern, instead of keeping the previously defined value I get every backslash escaped:

'\\bname=\\"{}\\"\\\\r\\\\n\\\\r\\\\n([^-]+)\\\\r'

Then, when I run the <any of the compiled patterns>.search(s) method there are no matches. I've tested the pattern here and it works as expected with each of the keywords. How can I avoid this backslash escaping? And, in the case that what I'm asking is not necessary, what am I doing wrong?

Upvotes: 2

Views: 2769

Answers (2)

Josh Lee
Josh Lee

Reputation: 177550

A raw string only affects how the literal is parsed. The string object has no way to remember what exactly you typed, so when it shows to you backslash escaped it's showing you what the non-raw literal would have been.

These three are equivalent:

>>> re.compile('\r', re.DEBUG)
LITERAL 13
>>> re.compile('\\r', re.DEBUG)
LITERAL 13
>>> re.compile(r'\r', re.DEBUG)
LITERAL 13

But this is not:

>>> re.compile(r'\\r', re.DEBUG)
LITERAL 92
LITERAL 114

Upvotes: 2

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140168

You're already using the raw prefix. So no need to double escape \r or \n or they'll be taken literally (regex accepts literal \n or \\n). So the only problem remains the \b char that you need to pass as raw:

pattern = r'\bname="{}"\r\n\r\n([^-]+)\r'

alternative without raw:

pattern = '\\bname="{}"\r\n\r\n([^-]+)\r'

With those I get matches with your string (when I use it as string, not bytes)

Upvotes: 1

Related Questions