Reputation: 10662
A common technique to prevent XSS attacks is to encode untrusted data before displaying it on the HTML page. Inside the page there are different contexts that it can appear in, every one requires different encoding.
Encoding the responses on the server-side doesn't make sense because at this layer we don't know where in the HTML page the data will appear.
So it is convenient and more reasonable to encode on the client-side. The question is if it's safe. On a first impression it sounds unsafe because the attacker can modify the client-code (say JavaScript). But when you think about it, the modified code will be available only to the attacker's browser. Other visitors of the web site won't be affected by the changes.
Is it still safe or am I missing something?
Upvotes: 1
Views: 1263
Reputation: 1318
Fighting with XSS at server-side is not as difficult as you think. The goal is know which character you should encode in which context.
Basically there is 4 different context that we should consider about XSS.
HTML Context
If you want to get XSS as an attacker in this context you need < and > characters. Therefor encoding only these two characters can solved HTML Context Based XSS.
/**
* XSS protection function for HTML context only
* @usecases
* <title>use this function if output reflects here or as a content of any HTML tag.</title>
* e.g., <span>use this function if output reflects here</span>
* e.g., <div>use this function if output reflects here</div>
* @description
* Sanitize/Filter < and > so that attacker can not leverage them for JavaScript execution.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function htmlContextCleaner($input) {
$bad_chars = array("<", ">");
$safe_chars = array("<", ">");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
Javascript Context
Most common use cases are like following codes.
<script> var name = 'USERINPUTISHERE';</script>
or
<script> var name = "USERINPUTISHERE";</script>
or
<button type="submit" onclick="return callSomeFunction('USERINPUTHERE')">
In order to prevent your application against JS Context Based XSS attack. You need to encode 6 specific characters. Please read following description in order to understand attack vectors for script context.
/**
* XSS protection function for script context only
* @usecases
* @double quoted case e.g.,
* <script> var searchquery = "use this function if output reflects here"; </script>
* @single quoted case e.g.,
* <script> var searchquery = 'use this function if output reflects here'; </script>
* @description
* Sanitize/Filter meta or control characters that attacker may use to break the context e.g.,
* "; confirm(1); " OR '; prompt(1); // OR </script><script>alert(1)</script>
* \ and % are filtered because they may break the page e.g., \n or %0a
* & is sanitized because of complex or nested context (if in use)
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function scriptContextCleaner($input) {
$bad_chars = array("\"", "<", "'", "\\\\", "%", "&");
$safe_chars = array(""", "<", "'", "\", "%", "&");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
Attribute Context
Following codes can be example for attribute context.
<input name="fname" value="USERINPUTISHERE">
Or single quote form of sample example.
<input name='fname' value='USERINPUTISHERE'>
Basically we need to encode very spesific characters to make it secure. We need to encode back-tick's too in order to make secure context for old version of IE. Please read following descriptions and codes.
/**
* XSS protection function for an attribute context only
* @usecases
* @double quoted case e.g.,
* <div class="use this function if output reflects here">attribute context</div>
* In above example class attribute have been used but it can be any like id or alt etc.
* @single quoted case e.g.,
* <input type='text' value='use this function if output reflects here'>
* @description
* Sanitize/Filter meta or control characters that attacker may use to break the context e.g.,
* "onmouseover="alert(1) OR 'onfocus='confirm(1) OR ``onmouseover=prompt(1)
* back-tick i.e., `` is filtered because old IE browsers treat it as a valid separator.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function attributeContextCleaner($input) {
$bad_chars = array("\"", "'", "``");
$safe_chars = array(""", "'", "`");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
Style Context
As an attacker, getting XSS with Style Context usually related to IE . Please read following descriptions and codes again.
/**
* XSS protection function for style context only
* @usecases
* @double quoted case e.g.,
* <span style="use this function if output reflects here"></span>
* @single quoted case e.g.,
* <div style='use this function if output reflects here'></div>
* OR <style>use this function if output reflects here</style>
* @description
* Sanitize/Filter meta or control characters that attacker may use to execute JavaScript e.g.,
* ( is filtered because width:expression(alert(1))
* & is filtered in order to stop decimal + hex + HTML5 entity encoding
* < is filtered in case developers are using <style></style> tags instead of style attribute.
* < is filtered because attacker may close the </style> tag and then execute JavaScript.
* The function allows simple styles e.g., color:red, height:100px etc.
* @author Ashar Javed
* @Link https://twitter.com/soaj1664ashar
* @demo http://xssplaygroundforfunandlearn.netai.net/final.html
*/
function styleContextCleaner($input) {
$bad_chars = array("\"", "'", "``", "(", "\\\\", "<", "&");
$safe_chars = array(""", "'", "`", "(", "\", "<", "&");
$output = str_replace($bad_chars, $safe_chars, $input);
return stripslashes($output);
}
Conclusion
Fighting with XSS on server-side encoding is can be easy if you know which characters should be encoded in their special context. There is only one complications when you use back tick (`) as a attribute delimiter like following .
<input name=`fname` value=`USERINPUTHERE`>
This functions can not protected your apps. But I haven't seen any real life example for this case!!!
Approach that I've described at above is tested against tens of hackers/security researchers and no one exploit it. ( Details : https://twitter.com/soaj1664ashar/status/478939711667712000 ) . Also Symphonycms uses this approach for XSS preventions ( Also all codes examples grabbed from their repo. https://github.com/symphonycms/xssfilter/blob/master/lib/xss.php )
As you can see, you need to know variable gonna reflect at which location. Usually, developers knows output locations of variables but if you really don't know that DOMPurify could be more usefull for you but I believe it may cause code complexity and hard to maintain.
DOMPurify
DOMPurify is a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. It's written in JavaScript and works in all modern browsers (Safari, Opera (15+), Internet Explorer (9+), Firefox and Chrome - as well as almost anything else using Blink or WebKit). It doesn't break on IE6 or other legacy browsers. It simply does nothing there.
DOMPurify is written by security people who have vast background in web attacks and XSS. Fear not.
Further informations can be found over here ( https://github.com/cure53/DOMPurify )
Upvotes: 2
Reputation: 4514
In theory, encoding client-side is no more dangerous than encoding server-side. The key to making it secure really is in how rigourous you are in putting suitable encoding in all the places which renders your data. You can certainly create a good implementation for rendering user submitted data safely on client and server sides. Practically though, a drawback of implementing output encoding client side is that a potential attacker can easily examine your source code for flaws. This means that if there are bugs in your client-side encoding implementation, it will be easier to find than say on the server-side (assuming a closed source system). If you are developing open source software, then this point is moot.
Also as you said, an attacker modifying your client-side encoding code is a non-issue as they will only be modifying their own copy of the code and will not affect other visitors.
IMO it is actually cleaner to let the client handle encoding especially if you are developing an API which is shared by web and native mobile applications. You don't want your mobile application to have to convert HTML encoded values back to it's original form.
Upvotes: 3