Reputation: 159
I have a ('stolen':) Python code that use regex to parse all HTTP headers.
It is like this:
parser = re.compile(r'\s*(?P<key>.+\S)\s*:\s+(?P<value>.+\S)\s*')
header_list = [(key, value) for key, value in parser.findall(http_headers)]
Normally this works great, but the following header is not found:
Access-Control-Allow-Origin: *
I think it can have something to do with the asterisk, but I'm not sure. I think the regex part:
P<value>.+\S
is used to match and group . any character + one or more times followed by \S any non-whitespace. Isn't asterisk a part of that?
Any ideas?
Upvotes: 0
Views: 370
Reputation: 72875
The problem here is actually quite simple. The final .+
expects any character, then followed by a \S
another single character. tl;dr: it only matches 2 or more characters after the regex.
Use a *
to look for 0 or more characters (plus the \S
) instead:
\s*(?P<key>.+\S)\s*:\s+(?P<value>.*\S)\s*
# ^ * instead of +
Upvotes: 2