daghan
daghan

Reputation: 1028

XSS safe html decode for Javascript

I need to decode html in javascript. e.g.:

var str = 'apple & banana';
var strDecoded = htmlDecode(str); // I expect 'apple & banana'

There is no guarantee that the given str is already encoded and common jquery and DOM tricks are XSS vulnerable:

var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;'; // if you see 1 alerted, it means it is XSS vulnerable
var strDecoded; // I wish to get: &</textarea><img src=x onerror=alert(1)>ハローワールド

strDecoded = $('<div/>').html(attackStr).text(); // vulnerable in all browsers

strDecoded = $('<textarea/>').html(attackStr).text(); // vulnerable in ie 9 and firefox


var dv = document.createElement('div');
dv.innerHTML = attackStr; // vulnerable in all browsers
strDecoded = dv.innerText;

var ta = document.createElement('textarea');
ta.innerHTML = attackStr; // vulnerable in ie 9 and firefox
strDecoded = ta.value;

Is there any XSS-safe way to html-decode?

Upvotes: 3

Views: 8864

Answers (6)

daghan
daghan

Reputation: 1028

The best I could get so far:

function htmlDecode(str){
    if(typeof str != "string") return str;
    str = str.replace(/</g,"&lt;");
    str = str.replace(/>/g,"&gt;");     
    var ta = document.createElement("textarea");
    ta.innerHTML = str;
    return ta.value;        
}

//test:
var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;';
alert(htmlDecode(attackStr)); // &</textarea><img src=x onerror=alert(1)>ハローワールド

Upvotes: 0

Chris Lear
Chris Lear

Reputation: 6742

Taking a mix of your code and the highest-voted (not the accepted) answer at HTML Entity Decode, how about this:

var decodeEntities = (function() {
  // this prevents any overhead from creating the object each time
  var element = document.createElement('textarea');

  function decodeHTMLEntities (str) {
    if(str && typeof str === 'string') {
      str = str.replace(/</g,"&lt;");
      str = str.replace(/>/g,"&gt;");
      element.innerHTML = str;
      str = element.textContent;
      element.textContent = '';
    }

    return str;
  }

  return decodeHTMLEntities;
})();

Fiddle here: http://jsfiddle.net/ursu67z6/

You could also have a look at https://github.com/mathiasbynens/he maybe. I haven't gone through it myself, but it might deal with some cases better. I expect that if you are only decoding rather than encoding, the dom-based approach is better.

Upvotes: 5

Romain
Romain

Reputation: 573

Here is a clean solution that does not imply to inject the HTML anywhere. Copy both these functions somewhere in your code: http://phpjs.org/functions/html_entity_decode/ and http://phpjs.org/functions/get_html_translation_table/

You'll have to remove "this" in "html_entity_decode" on line 26.

console.log( html_entity_decode('&amp;</textarea><img src=x onerror=alert(1)>') );
// &</textarea><img src=x onerror=alert(1)>

Cheers.

-- EDIT --

Your textarea trick looks good, did it cover all your use cases ?

The only other javascript solution I think about is to use a sandboxed, same-domain, iframe. It gives me good results but would only work in recent web browsers... I post the code in case.

function safeHtmlDecode(str, callback)
{
    var sameDomainBlankPage = document.location.href; // This should be a blank html page located on same domain
    $iframe = $('<iframe sandbox="allow-same-origin"/>').attr("src", sameDomainBlankPage);
    $iframe.on("load", function() {
        var body = $iframe.contents()[0].body;
        body.innerHTML = str;
        callback(body.innerText);
    });
    $("body").append($iframe);
}
$(document).ready(function(){
    var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;';
    safeHtmlDecode(attackStr, function(htmlString) {
        console.log( htmlString );
    });
});

Upvotes: 0

Ashish Panchal
Ashish Panchal

Reputation: 504

You can use jQuery function like below, to encode or decode the input String

function htmlEncode(value){
  return $('<div/>').text(value).html();
}

function htmlDecode(value){
  return $('<div/>').html(value).text();
}

htmlDecode('&lt;b&gt;test&lt;/b&gt;')
// result "<b>test</b>"

htmlDecode('test')
// result "test"

In this code

  1. I'm actually creating a Div which is not actually present on the page
  2. Passing input string to the htmlDecode function
  3. jQuery automatically encode/decode the string
  4. Returning the new html/text

Hope this helps!

Upvotes: 0

Ruchit Patel
Ruchit Patel

Reputation: 1040

If you want to safely display the content.

Use innerText or jQuery.text() method instead of innerHTML/.html()

Upvotes: 0

Mehmet Ince
Mehmet Ince

Reputation: 1318

DOMPurify is a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. It's written in JavaScript and works in all modern browsers (Safari, Opera (15+), Internet Explorer (9+), Firefox and Chrome - as well as almost anything else using Blink or WebKit). It doesn't break on IE6 or other legacy browsers. It simply does nothing there.

DOMPurify is written by security people who have vast background in web attacks and XSS. Fear not.

I've tested and use DOMPurify and it's really good at sanitize untrusted data on client-side. Using is very simple.

Import the purify.js

<script type="text/javascript" src="purify.js"></script>

And call your untrusted variable.

var attackStr = '</textarea><img src=x onerror=alert(1)>'
var clean = DOMPurify.sanitize(attackStr );

Output will be like following.

<img src="x">

You can test your XSS payload at here https://cure53.de/purify

Source codes, examples and documentations are can be found over here ( https://github.com/cure53/DOMPurify )

Upvotes: 1

Related Questions