James
James

Reputation: 1410

Regex to match ampersands in a URI that are followed by an equals and not another ampersand

My regex knowledge is escaping me on this one...

Say I have a URL with a URI as a query parameter, ala:

http://hostname.com?uri=http://website.com/company/YoYo+&+Co+Inc&type=company

...assuming our uri param doesn't contain any params itself, I want to manually parse out the query params in Javascript, but obviously the ampersand in our embedded uri param makes it more difficult then simply splitting on all ampersands and running with it from there.

What I really want to do is define a regex that matches only question marks and ampersands that are followed by an equals prior to being followed by another ampersand (or end of line). I came up with this which comes close but is including the non-capturing text as well and I'm not sure why:

[?&](?:[^&]+)=

...that results in a match on ?uri= as well as &type= which is close but capturing more than I want. What am I doing wrong such that it's not capturing just the ? and & in matches? In other words, it should only be capturing the ? prior to uri and the & prior to type.

Upvotes: 6

Views: 15017

Answers (1)

JDiPierro
JDiPierro

Reputation: 802

If I understand correctly and you just want to match the ? or & then your regex should be:

[?&](?==)

Explanation:

[?&] is a set of characters containing just ? and &. Meaning it will look for one of those.

(?= ) This is a positive lookahead. It means "This has to come after the main match but don't include it". So to make it find an = looks kind of funny as (?==)


If you want to include the word "uri" or "type" then add a \w after the character set and before the lookahead:

[?&]\w+(?==)

+ means "match 1 or more"


And just one more in case that's not exactly what you're looking for! If you want to get rid of the &/? but keep the text we'd wrap the character set in a positive lookBEHIND. The syntax for that is (?<= ). That would change the regex to this:

(?<=[?&])\w+(?==)

Example of that at work: http://regexr.com?35q0u


In reponse to comment: You can match just the ? and & by putting the \w+ inside of the positive lookahead:

[?&](?=\w+=)

And because I'm bored and like regexs a bit too much, here's one that will match the value of the tag:

(?<==).*?(?=[&?]\w+=|$)

Example: http://regexr.com?35q11 There's multiple highlighted sections because global matching is on.

Upvotes: 8

Related Questions