user2320462
user2320462

Reputation: 259

Regex to handle cookies

I'm using httpwebreqest/httpwebresponse, the problem is on some sites httpwebresponse doesn't recognize cookies. This is what response.Headers returns.

 Cookie1=1;domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT
 Cookie2= ; HTTPOnly= ; domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,
 Cookie5= ; domain=.host.com;path=/;HTTPOnly= ;version=1
 Cookie3=2; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.host.com;path=/;HTTPOnly= ;version=1
 Cookie4=3; domain=.host.com;path=/;version= 

Raw (the cookies from response.Headers are all in single line string):

 Cookie1=1;domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,Cookie2= ; HTTPOnly= ; domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,Cookie5= ; domain=.host.com;path=/;HTTPOnly= ;version=1,Cookie3=2; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.host.com;path=/;HTTPOnly= ;version=1,Cookie4=3; domain=.host.com;path=/;version= 

The following regex would work perfectly:

(.*?)=(.*?);

But the problem is I need to scrape the domain and expiration date too, but the domain and 'expires' appears in mixed locations. How can I scrape all the cookies and get domain and expiration field? thanks!

Upvotes: 0

Views: 1337

Answers (1)

decPL
decPL

Reputation: 5402

You need something as follows:

@"Cookie(?<index>\d+)\s*=\s*((domain\s*=\s*(?<domain>.*?)[;,])|(expires\s*=\s*(?<expires>.*?GMT))|(.(?!Cookie\d+=)))*"

with the following options

RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture

Depending on whether your times are all GMT, you may want to use something more sophisticated to capture the 'expires' part.

Upvotes: 1

Related Questions