Python regular expression for HTTP Request header

Question

I have a question about Python regex. I don't have much information about Python regex. I am working with HTTP request messages and parsing them with regex. As you know, the HTTP GET messages are in this format.

GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: 10.2.0.12
Connection: Keep-Alive

I want to parse the URI, method, user-agent, and the host areas of the message. My regex for this job is:

r'^({0})\s+(\S+)\s+[^
]*$
.*^User-Agent:\s*(\S+)[^
]*$
.*^Host:\s*(\S+)[^
]*$
'.format('|'.join(methods)), re.MULTILINE|re.DOTALL)

But, when the message comes up with like

GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive

I can not catch them because of the places of host or, user-agent changed. So I need a generic regex that will catch all of them, even if the places of host, method, uri are changed in the message.

Maria Zverina · Accepted Answer

Parse the whole headers into a dictionary like so?

headers = """GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive"""


headers = headers.splitlines()
firstLine = headers.pop(0)
(verb, url, version) = firstLine.split()
d = {'verb' : verb, 'url' : url, 'version' : version}
for h in headers:
    h = h.split(': ')
    if len(h) < 2:
        continue
    field=h[0]
    value= h[1]
    d[field] = value

print d

print d['User-Agent']
print d['url']

Python regular expression for HTTP Request header

Answers (2)

Related Questions