bsteo
bsteo

Reputation: 1779

Python REGEX matching a multiline with carriage return

I have the following data:

POST / HTTP/1.1
User-Agent: curl/7.27.0
Host: 127.0.0.1
Accept: */*
Content-Length: 55
Content-Type: application/x-www-form-urlencoded

id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk

or

POST / HTTP/1.1\r\n
User-Agent: curl/7.27.0\r\n
Host: 127.0.0.1\r\n
Accept: */*\r\n
Content-Length: 55\r\n
Content-Type: application/x-www-form-urlencoded\r\n
\r\n
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk\r\n

or

POST / HTTP/1.1^M
User-Agent: curl/7.27.0^M
Host: 127.0.0.1^M
Accept: */*^M
Content-Length: 55^M
Content-Type: application/x-www-form-urlencoded^M
^M
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk^M

how can I match the id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk string only? I mean anything printable between two end of lines (\r\n or ^M) and next end of line (\r\n or ^M) I tried something like:

re.findall(r'^>([^\r\n]+)[\r\n]([a-zA-Z0-9=%&\r\n]+)', buf, re.MULTILINE|re.DOTALL)

but no match. What am I doing wrong?

Upvotes: 3

Views: 13091

Answers (2)

Jerry
Jerry

Reputation: 71578

I'm not sure why you have > at the beginning of your regex. This is what is preventing you from getting any matches at all. If you now remove it, there are a lot of matches which you do not seem to need.

I would suggest:

(?<![\r\n])(?:\r\n|\r|\n){2}[^\r\n]+

Which ensures that you have only 2 consecutive newlines (either two \r\n, \r, or \n) before the line you're trying to match. The negative lookbehind (?<![\r\n]) is what enforces it (it fails the match if there's a newline/carriage return character before the two consecutive newlines).

The above regex doesn't really need the multiline and dotall flags, so you can drop them in this instance if you want to.

regex101 demo


EDIT: Since the \r, \n and ^M are not metacharacters, I would suggest this:

(?<![\r\n])(?:(?:\\r\\n|\^M)?(?:\r\n|\r|\n)){2}((?:(?!\\r\\?n?|\\n|\^M)[^\r\n\x00])+)(?:\\r\\n|\^M)?

regex101 demo

Upvotes: 1

Sujith PS
Sujith PS

Reputation: 4864

Try this :

(?:(?:\^M)|[\n\r])+(id=.*)(?=(?:\^M)|[\n\r])

Check online DEMO

Explanation

enter image description here

Upvotes: 1

Related Questions