Reputation: 839
I have problems with parsing a HTTP "Via" tag from a client's browser. This is an example of an HTTP header that I got:
GET / HTTP/1.0
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*
Accept-Language: sr-Latn-RS
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MATM; AskTbGOM2/5.8.0.12304)
Accept-Encoding: gzip, deflate
Host: 10.0.1.7
Via: 1.1 smtp.local:3128 (squid/2.6.STABLE21)
X-Forwarded-For: 10.0.0.75
Cache-Control: max-age=259200
Connection: keep-alive
Now, I need to get the smtp.local:3128
part from this header, but the regex I wrote does not work.
Example pattern, written in C# (doesnt work):
string matchHttpVia = @"Via: 1.1 (\.+:\d+)";
Note that there could also be an IP instead of a hostname.
Upvotes: 1
Views: 303
Reputation: 11923
To parse Via: x.x host:port you can use the regex:
Via: \d+\.\d+ (.*:\d+) (\(.*\))?
This should also be sufficient actually:
Via: \d+\.\d+ (.*:\d+)
That should do the trick for all possible cases of 'version', host and port.
Upvotes: 2
Reputation: 3250
As Konerak commented, removing the backslash from before the dot, giving Via: 1.1 (.*:\d+)
should fix your problem. \.
matches only a literal dot character where .
matches any character.
Note though, that this will only work if "1.1" is the only thing that can appear between the "Via:" and the hostname/IP. I don't know enough about HTTP headers to know if that's a safe assumption, but it seems like it might not be.
Upvotes: 0