Reputation: 88187
I have always been confused with URL/HTML encoding/escaping. I am using PHP, so I want to clear some things up.
Can I say that I should always use
urlencode
: for individual query string parts
$url = 'http://test.com?param1=' . urlencode('some data') . '¶m2=' . urlencode('something else');
htmlentities
: for escaping special characters like <>
so that if will be rendered properly by the browser
Would there be any other places I might use each function? I am not good at all these escaping stuff and am always confused by them.
Upvotes: 20
Views: 31762
Reputation: 165193
First off, you shouldn't be using htmlentities() around 99% of the time. Instead, you should use htmlspecialchars() for escaping text for use inside XML and HTML documents.
htmlentities
are only useful for displaying characters that the native character set you're using can't display (it is useful if your pages are in ASCII, but you have some UTF-8 characters you would like to display). Instead, just make the whole page UTF-8 (it's not hard), and be done with it.
As far as urlencode(), you hit the nail on the head.
So, to recap:
Inside HTML:
<b><?php echo htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?></b>
Inside of a URL:
$url = '?foo=' . urlencode('bar');
Upvotes: 34
Reputation: 117417
That's about right. Although - htmlspecialchars is fine, as long as you get your charsets straight. Which you should do anyway. So I tend to use that, so I would find out early if I had messed it up.
Also note that if you put a URL into an HTML context (say - in the href
of an a
-tag), you need to escape that. So you'll often see something like:
echo "<a href='" . htmlspecialchars("?foo=" . urlencode($foo)) . "'>clicky</a>"
Upvotes: 19
Reputation: 33238
If you are building a query string for your URL, then it's best to just use http_build_query()
instead of manually encoding each part.
$params = [
'param1' => 'some data',
'param2' => 'something else',
];
echo '<a href="https://test.com?'.htmlspecialchars(http_build_query($params)).'">Link</a>';
All output in HTML should be HTML encoded too, despite there being a very tiny chance your URL, which is properly encoded, will break the HTML.
Upvotes: 0