Jiew Meng
Jiew Meng

Reputation: 88337

URL/HTML Escaping/Encoding, Is escaping `&` from URL's actually required?

I have always been very confused with URL/HTML Escaping. More recently I looked deeper into it. Then looking at the PHP Docs for urlencode

$query_string = 'foo=' . urlencode($foo) . '&bar=' . urlencode($bar);
echo '<a href="mycgi?' . htmlentities($query_string) . '">';

I then realized that theres & in most query strings that seems like should be escaped. But it seems to work without escaping. I wonder why, and if its actually required.

Upvotes: 1

Views: 2665

Answers (2)

Pekka
Pekka

Reputation: 449783

Escaping & into &amp; is required in HTML, but it works in most browsers anyway. If it wouldn't, 90% of the Internet would break. :) It still is good style to escape ampersands, and it is required for the document to pass validation.

See this W3C document for some good background why (the text focuses on a specific behaviour of PHP, but that doesn't really matter): Ampersands, PHP Sessions and Valid HTML. Money quote (emphasis mine):

In order to display reserved characters HTML and XHTML provide a mechanism called character references. The syntax of these is:

  • an ampersand
  • a "code" for the referenced character
  • a semicolon
  • For example, the "less than" character is represented as &lt;.

Giving the ampersand special meaning makes it, like <, a reserved character, so it also needs to be represented by an entity for it to be used in a document - &amp;

Upvotes: 4

Spudley
Spudley

Reputation: 168803

You're right.

Inside of an HTML document, the ampersand character (&) is not allowed, except when specifying an entitiy (such as &amp;).

Therefore, code such as <a href='mycgi?foo=1&bar=2'> is invalid HTML. It should throw an error if you run it through a validator.

Most (all?) browsers will cope with it without an error though. There's no ambiguity here, so it will work.

However, it is still a good idea to convert them into entities anyway, because there is always the possibility of an ambiguity creeping in - for example, if you have a parameter in your URL named amp instead of bar, how would the browser deal with that? It's not quite as clear cut. So you should convert them all to entities to avoid any future issues, even if you don't have any now.

Upvotes: 1

Related Questions