user48408
user48408

Reputation: 3354

Using HttpUtility.HtmlEncode and handling special characters/umlaut etc

I'm using HttpUtility.HtmlEncode to sanitise user input to prevent against XSS attacks. My problem is that HtmlEncode converts special characters like ü into their Html equivalent code. I can't find the documentation about what it does and doesn't encode. Then in order to display this correctly back to the user I need to HtmlDecode it.

2 questions:

  1. How does HtmlEncode decide that it needs to encode a supposedly valid character like ü and not other unicode characters like standard English alphabet characters. Does HtmlEncode encode all non ascii characters? What is the best way to prevent script tags but allow special characters like umlauts without creating a special ignore list?

  2. Does using HtmlDecode expose a risk as it is converting back potentially malicious javascript

Upvotes: 2

Views: 5474

Answers (1)

Nzall
Nzall

Reputation: 3555

  1. HTMLEncode() does 2 main things:
    1. It handles any characters that aren't part of the default 127 ASCII characterset.
    2. It encodes characters that could be misinterpreted by the browser as being valid HTML, CSS or Javascript, to prevent both accidental and intentional altering of the webpage.
  2. Is it dangerous to use? Everything can be dangerous to use, depending on how you use it. The question is not as much "are you decoding?" but rather "Are you decoding user data?". It can definitely be dangerous to use, depending on what you do with the result. Even just displaying it to the client can cause XSS.

There is FAR more to be told about encoding and decoding than I can write in here, and people before me have explained it far more exhaustive than I can. This article on preventing XSS in Asp.Net can explain you what XSS is and how you can prevent it.

Upvotes: 1

Related Questions