Reputation: 448
I'm working with the content of a POST body and want extract the values for each key. The data I'm trying to parse is:
s = b'----------------------------941135026682458398564529\r\nContent-Disposition: form-data; name="username"\r\n\r\nmyusername\r\n----------------------------941135026682458398564529\r\nContent-Disposition: form-data; name="password"\r\n\r\nmypassword\r\n----------------------------941135026682458398564529\r\nContent-Disposition: form-data; name="keyword"\r\n\r\nmykeyword\r\n----------------------------941135026682458398564529--\r\n'
What I want to get are the values myusername
, mypassword
and mykeyword
by using Python's re
module. For this reason I generated this pattern:
pattern = r'\bname=\"{}\"\\r\\n\\r\\n([^-]+)\\r'
which is then modified as needed to match each of the keys:
username_pattern = re.compile(pattern.format("username"))
password_pattern = re.compile(pattern.format("password"))
keyword_pattern = re.compile(pattern.format("keyword"))
The problem I'm facing is that all the backslashes are getting escaped, so when I define pattern
, instead of keeping the previously defined value I get every backslash escaped:
'\\bname=\\"{}\\"\\\\r\\\\n\\\\r\\\\n([^-]+)\\\\r'
Then, when I run the <any of the compiled patterns>.search(s)
method there are no matches. I've tested the pattern here and it works as expected with each of the keywords. How can I avoid this backslash escaping? And, in the case that what I'm asking is not necessary, what am I doing wrong?
Upvotes: 2
Views: 2769
Reputation: 177550
A raw string only affects how the literal is parsed. The string object has no way to remember what exactly you typed, so when it shows to you backslash escaped it's showing you what the non-raw literal would have been.
These three are equivalent:
>>> re.compile('\r', re.DEBUG)
LITERAL 13
>>> re.compile('\\r', re.DEBUG)
LITERAL 13
>>> re.compile(r'\r', re.DEBUG)
LITERAL 13
But this is not:
>>> re.compile(r'\\r', re.DEBUG)
LITERAL 92
LITERAL 114
Upvotes: 2
Reputation: 140168
You're already using the raw
prefix. So no need to double escape \r
or \n
or they'll be taken literally (regex accepts literal \n
or \\n
). So the only problem remains the \b
char that you need to pass as raw
:
pattern = r'\bname="{}"\r\n\r\n([^-]+)\r'
alternative without raw
:
pattern = '\\bname="{}"\r\n\r\n([^-]+)\r'
With those I get matches with your string (when I use it as string, not bytes)
Upvotes: 1