Reputation: 1779
I have the following data:
POST / HTTP/1.1
User-Agent: curl/7.27.0
Host: 127.0.0.1
Accept: */*
Content-Length: 55
Content-Type: application/x-www-form-urlencoded
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk
or
POST / HTTP/1.1\r\n
User-Agent: curl/7.27.0\r\n
Host: 127.0.0.1\r\n
Accept: */*\r\n
Content-Length: 55\r\n
Content-Type: application/x-www-form-urlencoded\r\n
\r\n
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk\r\n
or
POST / HTTP/1.1^M
User-Agent: curl/7.27.0^M
Host: 127.0.0.1^M
Accept: */*^M
Content-Length: 55^M
Content-Type: application/x-www-form-urlencoded^M
^M
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk^M
how can I match the id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk
string only? I mean anything printable between two end of lines
(\r\n or ^M
) and next end of line
(\r\n or ^M
)
I tried something like:
re.findall(r'^>([^\r\n]+)[\r\n]([a-zA-Z0-9=%&\r\n]+)', buf, re.MULTILINE|re.DOTALL)
but no match. What am I doing wrong?
Upvotes: 3
Views: 13091
Reputation: 71578
I'm not sure why you have >
at the beginning of your regex. This is what is preventing you from getting any matches at all. If you now remove it, there are a lot of matches which you do not seem to need.
I would suggest:
(?<![\r\n])(?:\r\n|\r|\n){2}[^\r\n]+
Which ensures that you have only 2 consecutive newlines (either two \r\n
, \r
, or \n
) before the line you're trying to match. The negative lookbehind (?<![\r\n])
is what enforces it (it fails the match if there's a newline/carriage return character before the two consecutive newlines).
The above regex doesn't really need the multiline and dotall flags, so you can drop them in this instance if you want to.
EDIT: Since the \r
, \n
and ^M
are not metacharacters, I would suggest this:
(?<![\r\n])(?:(?:\\r\\n|\^M)?(?:\r\n|\r|\n)){2}((?:(?!\\r\\?n?|\\n|\^M)[^\r\n\x00])+)(?:\\r\\n|\^M)?
Upvotes: 1
Reputation: 4864
Try this :
(?:(?:\^M)|[\n\r])+(id=.*)(?=(?:\^M)|[\n\r])
Explanation
Upvotes: 1