Reputation: 441
i try to parse set-cookie headers with regex in Python. For the set-cookie header i read the RFC 6265 Section 4.1 that describe how to build the set-cookie header. I try to build a regex from the specification and this is my current state:
([\x21\x23-\x27\x2A\x2B\x2D-\x39\x41-\x5A\x5E-\x7A\x7C\x7E]+)=[\x21\x23-\x2B\x2D-\x3A\x3C-\x5B\x5D-\x7E]*(;[\x20](((Expires|expires)=(Mon|Tue|Wed|Thu|Fri|Sat|Sun),[\x20][0-9]{2}-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-[0-9]{4}[\x20][0-9]{2}:[0-9]{2}:[0-9]{2}[\x20]GMT)|((Max-Age|max-age)=[1-9]+)|((Path|path)=[\x20-\x3A\x3C-\x7E]+)|(Secure|secure)|(HttpOnly|httponly)|([\x20-\x3A\x3C-\x7E]*)))*
I have problems with the recursive definition of the subdomain in the set-cookie header (domain=...
), that describes in RFC 1034 Section 3.5 and need help to frame that in regex.
But also my previous code work not expected completely. For example this set-cookie header
VISITOR_INFO1_LIVE=M_6WYFFF_fo; path=/; domain=.youtube.com; secure; expires=Tue, 07-Jul-2020 00:17:35 GMT; httponly; samesite=None, GPS=1; path=/; domain=.youtube.com; expires=Thu, 09-Jan-2020 00:47:35 GMT, YSC=8sXes3YfFFF; path=/; domain=.youtube.com; httponly, VISITOR_INFO1_LIVE=M_6WYFFF_fo; path=/; domain=.youtube.com; secure; expires=Tue, 07-Jul-2020 00:17:35 GMT; httponly; samesite=None
includes 4 cookies (VISITOR_INFO1_LIVE
twice, GPS
and YSC
) but my regex only catch 3 cookies (the YSC
cookie is missing). I test that on https://regex101.com/
Later i would parse many set-cookie headers to get the name of the cookies (or in the RFC calls that cookie-name).
Thanks for help!
Upvotes: 0
Views: 2800
Reputation: 341
After spending some more time on this question, I think that it is close to impossible to achieve what you desire using only regex.
There are no unique identifiers or delimiters for each cookie. Delimiters are used inside columns as well as between cookies. There is also no set number of columns or a mandatory final column. It is very difficult to write the negative part of this expression (what not to match).
Upvotes: 1
Reputation: 341
Short answer, as you asked how to parse the cookies with regex:
([^;]+);?
Then iterate through the matches.
The way you have formulated the question indicates that you would also like to validate the cookies and probably also separate them.
Upvotes: 1