Zoltán Tamási
Zoltán Tamási

Reputation: 12764

How to do white-listed HTML encoding in ASP.NET MVC 5?

I'm working on a mini-CMS module for one of my projects, where users are allowed to edit content in markdown. I'm using markdown-it for parsing and showing a preview.

I was thinking a lot about how to send the input to the server, and also how to store it in the database. I came to a conclusion to avoid duplicating the markdown parsing at server-side, and send both markdown and the parsed HTML to the server. I think nowadays the added overhead is minimal, even on a site where edits are heavy.

So at final stage I still need to validate the HTML sent to the server, as it can be a security bottleneck of the system. I've read a lot about Microsoft's implementation of AntiXSS, and how it is (or was) quite unusable for such scenarios, as it was too gready. For example I've found this article with even a helper code (using HTMLAgilityPack) to give a usable sanitizing implementation.

Unfortunately I haven't found anything newer than 2013 on this topic. I'd like to ask at present how to do a proper HTML encoding where there are allowed tags and attributes, but still safe from any kind of XSS attacks? Is such a code like in the article still needed, or are there any built-in solutions?

Also, if my choice of client-side markdown parsing is not viable, what are some other options? What I want to avoid, is duplicating all kinds of markdown logic at both client and server. For example I've prepared several custom extensions for markdown-it, etc.

Upvotes: 1

Views: 1220

Answers (2)

Zoltán Tamási
Zoltán Tamási

Reputation: 12764

I ended up using the code in the article. I made an important change, I removed style attributes from the whitelist completely. I don't need them, the styling which I allow can be achieved by classes. Also, style attributes are also dangerous and hard to encode/escape properly. Now I feel that the code is safe enough for my current purposes.

Upvotes: 0

Gabor Lengyel
Gabor Lengyel

Reputation: 15570

If you allow html to be edited on the client and stored to the server, you are basically opening up a can of worms. This applies to client side html editors, and also to your usecase where you want to save html generated from markdown. The problem is that a malicious user may send any html to your backend, not just one that can actually be generated from the markdown. Html code in this case will be plain user input and as such must not be trusted.

Say you want to implement whitelisting of tags and tag attributes, the HTMLAgilityPack way. Consider a simple link in html. You obviously want to allow the <a> tag, and also the href attribute to that so that links are possible. But what about <a href="javascript:alert(1)"> then? It will be vulnerable to obvious XSS, and this is just one example, it would be vulnerable in numerous ways.

Even worse is that you probably want to render user-given html on the client before a server roundtrip (something like a preview), and also save it to your database and render it after downloading it again. For this you have to turn off request validation, and also automatic encoding as those would make this impossible.

So you have a few options that could actually work to prevent XSS.

  • Client-side sanitization: You could use the client-side sanitizer from Google Caja (only the Javascript library, not the whole thing) to remove Javascript from any html content. The way this would work is before displaying any such html (before previewing html on the client, or before displaying html downloaded from the server), you would run it through Caja, and that would remove any Javascript, thus eliminating XSS. It works reasonably well in my experience, it removes Javascript from CSS too, and also the trivial ones like href, src, a script tag, event attributes (onclick, onmouseover, etc). Another similar library is HTML Purify, but that only works for new browsers and does not remove Javascript from CSS (because that does not work in newer browsers anyway).

  • Server-side sanitization: You could also use Caja on the server side properly, but that's probably way too difficult and hard to maintain for your usecase, and also if only this is implemented, preview on the client (without a server roundtrip) would still be vulnerable to DOM XSS.

  • Content-Security-Policy: You could use the Content-Security-Policy response header to disable all inline Javascript on your website. One drawback is that it has implications on your client-side architecture (you cannot have inline Javascript at all, obviously), and also browser support is limited, and in unsupported browsers your page will in fact be vulnerable to XSS. However, the latest version of current major browser all support Content-Security-Policy, so it is indeed a good option.

  • Separate frame: You could serve unsafe html from a different origin (ie. a different subdomain) and accept the risk of XSS on that origin. However, cross-frame communication would still be a problem, and so would authentication and/or CSRF depending on the solution. This is kind of the old school way, options above are probably better for your usecase.

You could also use a combination of these for defense in depth.

Upvotes: 1

Related Questions