Reputation: 1028
I need to decode html in javascript. e.g.:
var str = 'apple & banana';
var strDecoded = htmlDecode(str); // I expect 'apple & banana'
There is no guarantee that the given str is already encoded and common jquery and DOM tricks are XSS vulnerable:
var attackStr = '&</textarea><img src=x onerror=alert(1)>ハローワールド'; // if you see 1 alerted, it means it is XSS vulnerable
var strDecoded; // I wish to get: &</textarea><img src=x onerror=alert(1)>ハローワールド
strDecoded = $('<div/>').html(attackStr).text(); // vulnerable in all browsers
strDecoded = $('<textarea/>').html(attackStr).text(); // vulnerable in ie 9 and firefox
var dv = document.createElement('div');
dv.innerHTML = attackStr; // vulnerable in all browsers
strDecoded = dv.innerText;
var ta = document.createElement('textarea');
ta.innerHTML = attackStr; // vulnerable in ie 9 and firefox
strDecoded = ta.value;
Is there any XSS-safe way to html-decode?
Upvotes: 3
Views: 8864
Reputation: 1028
The best I could get so far:
function htmlDecode(str){
if(typeof str != "string") return str;
str = str.replace(/</g,"<");
str = str.replace(/>/g,">");
var ta = document.createElement("textarea");
ta.innerHTML = str;
return ta.value;
}
//test:
var attackStr = '&</textarea><img src=x onerror=alert(1)>ハローワールド';
alert(htmlDecode(attackStr)); // &</textarea><img src=x onerror=alert(1)>ハローワールド
Upvotes: 0
Reputation: 6742
Taking a mix of your code and the highest-voted (not the accepted) answer at HTML Entity Decode, how about this:
var decodeEntities = (function() {
// this prevents any overhead from creating the object each time
var element = document.createElement('textarea');
function decodeHTMLEntities (str) {
if(str && typeof str === 'string') {
str = str.replace(/</g,"<");
str = str.replace(/>/g,">");
element.innerHTML = str;
str = element.textContent;
element.textContent = '';
}
return str;
}
return decodeHTMLEntities;
})();
Fiddle here: http://jsfiddle.net/ursu67z6/
You could also have a look at https://github.com/mathiasbynens/he maybe. I haven't gone through it myself, but it might deal with some cases better. I expect that if you are only decoding rather than encoding, the dom-based approach is better.
Upvotes: 5
Reputation: 573
Here is a clean solution that does not imply to inject the HTML anywhere. Copy both these functions somewhere in your code: http://phpjs.org/functions/html_entity_decode/ and http://phpjs.org/functions/get_html_translation_table/
You'll have to remove "this" in "html_entity_decode" on line 26.
console.log( html_entity_decode('&</textarea><img src=x onerror=alert(1)>') );
// &</textarea><img src=x onerror=alert(1)>
Cheers.
-- EDIT --
Your textarea trick looks good, did it cover all your use cases ?
The only other javascript solution I think about is to use a sandboxed, same-domain, iframe. It gives me good results but would only work in recent web browsers... I post the code in case.
function safeHtmlDecode(str, callback)
{
var sameDomainBlankPage = document.location.href; // This should be a blank html page located on same domain
$iframe = $('<iframe sandbox="allow-same-origin"/>').attr("src", sameDomainBlankPage);
$iframe.on("load", function() {
var body = $iframe.contents()[0].body;
body.innerHTML = str;
callback(body.innerText);
});
$("body").append($iframe);
}
$(document).ready(function(){
var attackStr = '&</textarea><img src=x onerror=alert(1)>ハローワールド';
safeHtmlDecode(attackStr, function(htmlString) {
console.log( htmlString );
});
});
Upvotes: 0
Reputation: 504
You can use jQuery function like below, to encode or decode the input String
function htmlEncode(value){
return $('<div/>').text(value).html();
}
function htmlDecode(value){
return $('<div/>').html(value).text();
}
htmlDecode('<b>test</b>')
// result "<b>test</b>"
htmlDecode('test')
// result "test"
In this code
Hope this helps!
Upvotes: 0
Reputation: 1040
If you want to safely display the content.
Use innerText or jQuery.text() method instead of innerHTML/.html()
Upvotes: 0
Reputation: 1318
DOMPurify is a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. It's written in JavaScript and works in all modern browsers (Safari, Opera (15+), Internet Explorer (9+), Firefox and Chrome - as well as almost anything else using Blink or WebKit). It doesn't break on IE6 or other legacy browsers. It simply does nothing there.
DOMPurify is written by security people who have vast background in web attacks and XSS. Fear not.
I've tested and use DOMPurify and it's really good at sanitize untrusted data on client-side. Using is very simple.
Import the purify.js
<script type="text/javascript" src="purify.js"></script>
And call your untrusted variable.
var attackStr = '</textarea><img src=x onerror=alert(1)>'
var clean = DOMPurify.sanitize(attackStr );
Output will be like following.
<img src="x">
You can test your XSS payload at here https://cure53.de/purify
Source codes, examples and documentations are can be found over here ( https://github.com/cure53/DOMPurify )
Upvotes: 1