HttpUtility.HtmlEncode, HttpUtility.HtmlDecode, the AntiXSS library and correctly formatting user-entered input

Question

I'm trying to develop a secure web application that can accept form data, encode it into the database to eliminate cross-site scripting issues, and then format it nicely on other web pages.

Form data is being encoded using

HttpUtility.HtmlEncode('It's my wedding!')

An example of this working is someone entering "It's my wedding!" into a textbox. This enters the database formatted as:

It's my wedding!

If I then pull this out of the database and display it using a .NET literal control, it's displayed exactly like that, with the apostrophe remaining encoded on the screen.

Web browsers interpret & as an ampersand and © as a copyright symbol - Why don't they interpret the code ' as an apostrophe?

Say that I then use:

HttpUtility.HtmlDecode('It's my wedding!');

This will sort out my apostrophe issue, but if I use the HtmlDecode method when someone has managed to inject malicious javascript into this field such as:

It's my wedding!

It'll also decode the encoded javascript, and the attack will execute. If this is the case, why are we using HttpUtility.HtmlEncode() in the first place?

I've seen people using the Microsoft AntiXss library at http://wpl.codeplex.com/, but it seems to be receiving horrendous reviews about its quality and effectiveness due to users' inability to amend the white-list that it offers.

What are you supposed to do to safely encode HTML and allow it to display whilst still preventing XSS attacks? Is stripping / encoding the tags specifically the only solution?

How has everyone handled this before?

Thanks!

Karl

Karl · Accepted Answer

Okay, so here's the solution I've arrived at.

I want to protect other developers from switching off request validation and outputting fields without checking what they're outputting, so I'm going to use the HttpUtility.HtmlEncode method to encode the input. This means that when other developers spit this information out, it's still encoded and if they then wish to blithely throw the contents into HttpUtility.HtmlDecode, then it's their responsibility.

I however, will build a method that's capable of escaping only the most basic of formatting that I see frequently in my user input that can be construed as safe. Those characters in my case, are single quotes and double quotes. All other content will remain encoded. If there's a lot of a particular safe character appearing in real life user input or test input that I haven't addressed, I'll retrospectively add it to the whitelist.

HttpUtility.HtmlEncode, HttpUtility.HtmlDecode, the AntiXSS library and correctly formatting user-entered input

Answers (2)

Related Questions