Paul
Paul

Reputation: 9541

Regex to replace ampersands, but not when they're in a URL

So I have this regex:

&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)

That matches all &'s in a block of text

However, if I have this string:

& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>
---------------------------------------------------------^

... the marked & also get's targeted - and as I'm using it to replace the &'s with & the url then becomes invalid:

http://localhost/MyFile.aspx?mything=2&amp;this=4

D'oh! Does anyone know of a better way of encoding &'s that are not in a url.

Upvotes: 0

Views: 1957

Answers (2)

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

In powershell this could be done as:

$String ='& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>'
$String -replace '(?<!<[^<>]*)&', "&amp;"

yields

&amp; &amp; &amp; &amp; &amp; <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &amp;</a>

Dissecting the regex:

  1. The look around (?<! .... ) first validates that you're not in any tag
  2. All & strings are then found and replaced.

Upvotes: 0

Guffa
Guffa

Reputation: 700362

No, the URL does not become invalid. The HTML code becomes:

<a href="http://localhost/MyFile.aspx?mything=2&amp;this=4">

This means that the code that was not correctly encoded now is correctly encoded, and the actual URL that the link contains is:

http://localhost/MyFile.aspx?mything=2&this=4

So, it's not a problem that the & character in the code gets encoded, on the contrary the code is now correct.

Upvotes: 4

Related Questions