Phiplex
Phiplex

Reputation: 159

regex not finding http header

I have a ('stolen':) Python code that use regex to parse all HTTP headers.

It is like this:

parser = re.compile(r'\s*(?P<key>.+\S)\s*:\s+(?P<value>.+\S)\s*')
header_list = [(key, value) for key, value in parser.findall(http_headers)] 

Normally this works great, but the following header is not found:

Access-Control-Allow-Origin: *

I think it can have something to do with the asterisk, but I'm not sure. I think the regex part:

P<value>.+\S

is used to match and group . any character + one or more times followed by \S any non-whitespace. Isn't asterisk a part of that?

Any ideas?

Upvotes: 0

Views: 370

Answers (1)

brandonscript
brandonscript

Reputation: 72875

The problem here is actually quite simple. The final .+ expects any character, then followed by a \S another single character. tl;dr: it only matches 2 or more characters after the regex.

Use a * to look for 0 or more characters (plus the \S) instead:

\s*(?P<key>.+\S)\s*:\s+(?P<value>.*\S)\s*
#                                 ^ * instead of +

Upvotes: 2

Related Questions