regex not finding http header

Question

I have a ('stolen':) Python code that use regex to parse all HTTP headers.

It is like this:

parser = re.compile(r'\s*(?P.+\S)\s*:\s+(?P.+\S)\s*')
header_list = [(key, value) for key, value in parser.findall(http_headers)]

Normally this works great, but the following header is not found:

Access-Control-Allow-Origin: *

I think it can have something to do with the asterisk, but I'm not sure. I think the regex part:

P.+\S

is used to match and group . any character + one or more times followed by \S any non-whitespace. Isn't asterisk a part of that?

Any ideas?

brandonscript · Accepted Answer

The problem here is actually quite simple. The final .+ expects any character, then followed by a \S another single character. tl;dr: it only matches 2 or more characters after the regex.

Use a * to look for 0 or more characters (plus the \S) instead:

\s*(?P.+\S)\s*:\s+(?P.*\S)\s*
#                                 ^ * instead of +

regex not finding http header

Answers (1)

Related Questions