Reputation: 6943
So I am receiving html from a foreign untrusted source via json. I want to display the html in a div container like so:
$('#container').html(dangerousHTMLCode);
How can I prevent most importantly javascript injection, and secondary would be altering the styles of the page in general. This is all client side. The container div should be a flexible height to match the height of the content (possibly ruling out some iframe solution).
Update: The goal is to strip out all javascript and css from the html. This includes js and css that are present in the attributes of dom items (style="", onclick="", etc)
Upvotes: 4
Views: 3140
Reputation: 12786
You could use an iframe
and load the untrusted HTML inside that iframe. The iframe you have would use the sandbox
attribute to prevent any JavaScript injected inside the iframe to modify the environment outside the iframe.
<iframe class="untrusted" src="http://unsafe.example.com/" sandbox />
Or if the untrusted HTML is JSON then deparse JSON on your server.
<iframe class="untrusted" src="/Unsafe?url=http://unsafe.example.com/foo.json" sandbox />
Upvotes: 0
Reputation: 4628
OK my first attempt at this was a failure. I agree with Jan Dvorak's comment that the best approach is probably to do this with an XSS tool in a server-side proxy, especially since you're probably having to go through some kind of proxy anyway because you're doing cross-site requests and if you're using JSONP all is lost already.
However, since the question asked for a jQuery way of doing this...
Ideally, you'd find an HTML parser written in javascript, use it to build an element tree, and remove any elements or attributes that didn't match a whitelist of safe attributes.
Since I'm not aware of such a parser, and since you're in a browser that does have a parser, we'll try using that. We have to be careful though, that parser is attached to a javascript engine and an HTTP client, among other things.
First, as was pointed out in feedback to my first attempt, we have to do some work before we do anything that will create DOM elements because some events can run prior to DOm insertion. We need at the very least to make sure that no onX attributes will be parsed before any DOM objects are created. It might also be a good idea to run some interference with preloading in general. To that end, let's do some simple text transformation:
var xmlNameStartChars = "a-zA-Z_\\u00c0-\\u00d6\\u00d8-\\u00f6\\u00f8-\\u02ff\\0370-\\u037d\\u037f-\\u1fff\\u200c\\u200d\\u2070-\\u2218f\\u2c00-\\u2fef\\u3001-\\udbbf\\udc00-\\udfff\\uf900-\\ufdcf\\ufdf0-\\ufffd";
var xmlNsPfx = "[" + xmlNameStartChars + "][-.0-9\\u00b7\\u0300-\\u036f\\u203f-\\u2040" + xmlNameStartChars + "]*:";
var tagStartRE = new RegExp("<\\/?(" + xmlNsPfx + ")?", "g");
var tagStartDeZRE = new RegExp("(<\\/(" + xmlNsPfx + ")?)z", "g");
dangerousHTMLCode = dangerousHTMLCode.replace(/on/gi, "z$&"); // run interference with onX
// run interference with preloading
// But don't interfere with namespaces
dangerousHTMLCode = dangerousHTMLCode.replace(/<\/?(\w*:)?/g, "$&z");
Now we have made a best effort at making this safe to build a DOM tree out of. NOTE, however, that this best effort is NOT a guarantee of safety - there may very well be attacks that I have not considered, possibly revolving around bugs in past, present or future browsers or plugins. A particular concern is that I have made the assumption that the only attributes dangerous enough to have to interfere with at this point start with "on"; I think that's the case but I'm well short of 100% confident about it.
Continue at your own risk
As Jan pointed out to me in comments to my first attempt, a whitelist approach is probably superior to a blacklist approach. I'm going to start with a pretty basic list of whitelisted elements, add/remove to taste; we'll prefix them with z
s because our text manipulation did that too.
var wlElements = "zdiv, zspan, zem, zstrong, zp, za, zimg, ztable, zthead, ztbody, ztfoot, ztr, zth, ztd";
var nonWlSelector = ":not(" + wlElements + ")";
var dangerousDOM = $("<div/>").html(dangerousHTMLCode);
dangerousDOM.find(nonWlSelector).remove();
Now for the fun part, you have to remove dangerous attributes. This time I blacklist, in part because I was too lazy to think of all the attributes I'd want to whitelist... but I do whitelist URL schemes in src and href, it's not just "javascript:" that's potentially unsafe, "vbscript:" and "livescript:" at a minimum are dangerous in some browsers. You should probably whitelist attributes, there's a real chance that I forgot about or never knew about script attributes that do not start "on", for example. I haven't found a way of finding "bad" attributes without doing a brute-force DOM walk, so let's do that:
var badAttrs = /^(.*:)(zon|style|background)/i;
var suspectAttrs = /^(.*:)(src|href)$/i;
var goodSchemes = /^\s*([^:]*$|ftp:|tel:|https?:)/i;
function processAttributes(element) {
var toRemove = [];
var attrs = element.attributes;
for (var i = 0; i < attrs.length; i++) {
var name = attrs[i].name, val = attrs[i].value;
if (badAttrs.test(name) || (suspectAttr.test(name) && !goodSchemes.test(val)) {
toRemove.push(attrs[i].name);
}
}
while (toRemove.length) {
element.removeAttribute(toRemove.pop());
}
}
// Start walking from the root of our DOM fragment
var root = dangerousDOM[0];
var elements = [root];
// Walk until we have no more elements, processing their attributes and adding their children
while (elements.length) {
var elem = elements.pop();
if (elem.hasAttributes()) {
processAttributes(elem);
}
// Find children of this element and queue them up
child = elem.firstChild;
while (child) {
if (child.nodeType == 1) {
// It's an element
elements.push(child);
}
child = child.nextSibling;
}
}
And now we're ready to undo text manipulations we did at the start and inject the fragment. Again, our best effort to make this safer could have failed to consider attacks that work today, and could still allow attacks against future browser/plugin bugs. So, again, continue at your own risk.
var lessDangerousHtml = dangerousDOM.html();
lessDangerousHtml = lessDangerousHtml.replace(/z(on)/gi, "$1");
lessDangerousHtml = lessDangerousHtml.replace(tagStartDeZRE, "$1");
$("#container").html(lessDangerousHtml);
Many thanks to t.niese for constructive criticism.
Upvotes: 2