Ben Sch
Ben Sch

Reputation: 2949

C# regular expressions for URL query strings

I got the following scenario:

I get an affiliate network URL and need to append an appropriate URL parameter for tracking purposes (subID).

The actual problem: in some cases even one affiliate network supports different query string formats. Example:

1) http:/ /impde.sampleaffiliate.com/imp?pop(over)g(XXXXX)a(XXX)subid(subIdValue)

or

2) http:/ /clkde.sampleaffiliate.com/click?p=XXX&a=XXX&g=XXX&subid=subIdValue

The recognition of the affiliate network is pretty simple [url.Contains("sampleaffiliate")], but to get the query string format, I'm using regular expressions:

//query string parameter values are in brackets, e.g. ?a(12312)b(12343432)c(4242)
Regex parametersInBrackets = new Regex(@"^[\?]{1}\w+(\(.*\))+$");
//query string parameter values are separated by ampersands and equal signs, e.g. ?a=12312&b=12343432&c=4242
Regex parametersWithAmpersand = new Regex(@"^[\?]{1}.+(\&\w+\=.+)+$");

Those work fine for the "normal cases".

But here comes an additional difficulty - look at the following URL:

http:/ /pdt.sampleaffiliate.com/click?a(AAA)p(BBB)prod(CCC)ttid(DDD)url(http:/ /www.example.com/item.asp?param1=EEE&param2=FFF&param3=GGG)

In this case they use the name(value)name(value) notation in the query string, but as value for the last parameter ("url"), there is another URL in the &name=value&name=value notation, which makes it really hard for the regex to see, which of both is the one that is supposed to be used...

My current regular expressions both return "true" on IsMatch(uri.Query) for the last example.

Any ideas how to fix this?

Thanks in advance!

Upvotes: 2

Views: 5800

Answers (2)

ThisGuy
ThisGuy

Reputation: 2385

The "difficult link" you are getting is not properly URL encoded, so I suspect the built in ParseQueryString probably won't work (and I assume this is unfortunately out of your control).

You can use the following Regex to parse it into pieces:

^[\?]{1}(\w+\([^\)]+\))+$

a(AAA)
p(BBB)
prod(CCC)
ttid(DDD)
url(http://www.example.com/item.asp?param1=EEE&param2=FFF&param3=GGG)

Use this Regex first; if it returns is a match use it. If it fails, then use the build in ParseQueryString.

Upvotes: 2

Parimal Raj
Parimal Raj

Reputation: 20575

You can use static ParseQueryString() method of System.Web.HttpUtility class that returns NameValueCollection for param & value.

Uri myUri = new Uri("http://clkde.sampleaffiliate.com/click?p=XXX&a=XXX&g=XXX&subid=subIdValue");
string param1 = HttpUtility.ParseQueryString(myUri.Query).Get("p");

Check documentation at http://msdn.microsoft.com/en-us/library/ms150046.aspx

Upvotes: 2

Related Questions