Reputation: 41442
Let's say I have a simple ASP.NET MVC blog application and I want to allow readers to add comments to a blog post. If I want to prevent any type of XSS shenanigans, I could HTML encode all comments so that they become harmless when rendered. However, what if I wanted to some basic functionality like hyperlinks, bolding, italics, etc?
I know that StackOverflow uses the WMD Markdown Editor, which seems like a great choice for what I'm trying to accomplish, if not for the fact that it supports both HTML and Markdown which leaves it open to XSS attacks.
Upvotes: 10
Views: 4923
Reputation: 120586
If you need to do it in the browser: http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
Upvotes: 2
Reputation: 2849
If you are not looking to use an editor you might consider OWASP's AntiSamy.
You can run an example here: http://www.antisamy.net/
Upvotes: 8
Reputation: 8042
I'd vote for the FCKEditor but you have to do some extra steps to the returned output too.
Upvotes: 1
Reputation: 17528
You could use an HTML whitelist so that certain tags can still be used, but everything else is blocked.
There are tools that can do this for you. SO uses the code that Slough linked.
Upvotes: 0
Reputation: 27285
Why don't you use Jeff's code ? http://refactormycode.com/codes/333-sanitize-html
Upvotes: 1
Reputation: 33700
I'd suggest you only submit the markdown syntax. On the front end, the client can type markdown and have an HTML preview (same as SO), but only submit the markdown syntax server-side. Then you can validate it, generate the HTML, escape it and store it.
I believe that's the way most of us do it. In either case, markdown is there to alleviate anyone from writing structured HTML code and give power to those who wouldn't even know how to.
If there's something specific you'd like to do with the HTML, then you can tweak it with some CSS inheritance '.comment a { color: #F0F; }', front end JS or just traverse over the generated HTML from parsing markdown before you store it.
Upvotes: 1
Reputation: 1599
How much HTML are you going to support? Just bold/italics/the basic stuff? In that case, you can convert those to markdown syntax and then strip the rest of the HTML.
The stripping needs to be done server side, before you store it. You need to validate the input on the server as well, when checking for SQL-vulnerabilities and other unwanted stuff.
Upvotes: 3