Agrawal Shraddha
Agrawal Shraddha

Reputation: 764

What is difference between WebUtility.HtmlEncode and AntiXssEncoder.HtmlEncode?

AntiXssEncoder.HtmlEncode have support only for .Net framework. Can I use WebUtility.HtmlEncode for Antixss as we have our application in .net core 2.1?

Upvotes: 4

Views: 4558

Answers (1)

Dai
Dai

Reputation: 155085

TL;DR:

AntiXssEncoder.HtmlEncode have support only for .Net framework. Can I use WebUtility.HtmlEncode for Antixss as we have our application in .net core 2.1?

Correct.

But I want to stress that there is no-such thing as an "anti-XSS HTML-encoder" because all correctly-implemented HTML-encoders will protect your website from XSS attacks when used correctly.

  • (I don't know why Microsoft named it AntiXssEncoder, but given that at-the-time the main HtmlEncode implentation was actually buggy and insecure probably might have something to do with it, but that's ancient-history now.)

In .NET Core 2.1, you only need to use System.Net.WebUtility.HtmlEncode.

In other .NET releases (especially historical versions), things are complicated, read on if you dare...

Why AntiXssEncoder (aka AntiXss and AntiXss.Encoder) exists - and why it's obsolete in 2021:

  • The AntiXssEncoder class from the AntiXss NuGet package (aka Microsoft.Security.Application.AntiXss) is obsolete (and has been since 2014) when it was moved to System.Web.Security.AntiXss.

    • The other classes: AntiXssEncoder, Encoder, and AntiXss are just alternative APIs for the same underlying implementation in Encoder btw.
  • The AntiXssEncoder in System.Web.Security.AntiXss is not available in .NET Core 2.1. However this is not a significant problem:

    • The original Microsoft.Security.Application.AntiXss was created because HttpUtility.HtmlEncode was considered insecure because it did not encode single-apostrophe characters, so XSS attacks were possible against ASP.NET 1.x and ASP.NET 2.x WebForms (.aspx) pages that used single-apostrophes to delimit HTML attributes that contained user-specified values.

      • For example:

        String userProvidedValue = "bad.gif' onerror='alert()";
        <img src='<%= this.Server.HtmlEncode( userProvidedValue ) %>' />
        

        ...which will be rendered as:

        <img src='bad.gif' onerror='alert()' />
        
      • However this issue was fixed in ASP.NET 4.0 when HttpUtility.HtmlEncode was corrected to also HTML-encode those apostrophes. So the exact same code above will now be rendered as below, which won't show an alert():

        <img src='bad.gif&#39; onerror=&#39;alert()' />
        
    • AntiXssEncoder also supported specifying a list of excluded Unicode code-points or Char values, this was added because AntiXssEncoder defaulted to hex-encoding all Char values (not code-points!) above 0xFFFF, which unfortunately meant that even completely safe text in Arabic, Hebrew, Kanji, etc would be escaped, making the raw HTML almost unreadable and ballooning the output HTML length.

      • For example the (gibberish) string "لك أن كلا" would be rendered as "&#x644;&#x643; &#x623;&#x646; &#x643;&#x644;&#x627;" - which isn't good.

      • Fortunately AntiXssEncoder.MarkAsSafe can be used to exclude character ranges at the programmer's discretion.

      • By the time .NET Core 2.1 came out, the System.Net.WebUtility class (not to be confused with System.Web.HttpUtility, of course) was improved so that it does not unnecessarily hex-encode high Char values and it does also HTML-encode apostrophes, hence why AntiXssEncoder was no-longer needed.

  • In .NET Core 3.1 (and later, including .NET 5 and .NET 6) things improved further, but also got a bit confusing...

    • Things got better because System.Text.Encodings.Web.HtmlEncoder was added. This is a separate implementation (instead of simply wrapping WebUtility) which brings back AntiXssEncoder's ability to exclude ranges of characters from encoding just in case you need that functionality. But it's a bit of an edge-case, imo.
      • You can do this by calling HtmlEncoder.Create(TextEncoderSettings) with a configured TextEncoderSettings object with the required char ranges excluded.
  • In .NET Core 3.1, for the sake of back-compat, Microsoft brought back System.Web.HttpUtility, however this is just another wrapper over WebUtility.HtmlEncode.

    • It does also have HtmlAttributeEncode - which does not encode single-apostrophes. There is no good reason to use this method, imo. I'm surprised Microsoft hasn't annotated it with [Obsolete], actually.
  • However, in .NET Core (and .NET 5 and later) there isn't any way to HTML encode text such that named entities are used instead of hex-encoded entities (other than &lt;, &gtl and &amp;).

    • Previously, the AntiXssEncoder.HtmlEncode (both Microsoft.Security and System.Web.Security) method had a the useNamedEntities parameter which involved a large hard-coded table of known entity names, e.g. £ becomes &pound; instead of &#127;.
      • I imagine they removed this functionality because you cannot safely used named-entities unless all other software in your HTML-processing pipeline also supports it, and given it's a large table I expect that lots of people had issues with it breaking poorly-updated downstream code.
      • The HTML Living Standard (aka "HTML5") calls them character references instead of "entities" (a holdover from SGML DTDs) and defines the &#nnnn;-syntax as means of encoding Unicode code-points specifically as opposed to a character-value in some other encoding scheme, whereas previously in HTML4 the spec refers to ISO 10646 (aka UCS) characters which is not Unicode as we know it today. (and I suspect that browsers may have tried to map characters based on the document's encoding/code-page if the page wasn't encoded using Unicode (like Shift-JIS), but I might be wrong).

Finally, here's a table comparing the output from all of the different HtmlEncode methods found in .NET as of 2021:

HtmlEncode methods available in .NET Framework 4.8

  • Note that the following HtmlEncode methods are excluded because they're just wrappers over other implementations:
    • System.Web.HttpServerUtility (aka Server.HtmlEncode) just forwards to HttpUtility.HtmlEncode.
    • System.Web.UI.HtmlTextWriter.WriteEncodedText also forwards to HttpUtility.HtmlEncode.
    • System.Web.HttpUtility.HtmlEncode:
      • In ASP.NET 1.x and ASP.NET 2.0 (2001 and 2005 respectively) this is incorrectly implemented such that it does not escape apostrophes. I've included results from that implementation in the "System.Web.HttpUtility.HtmlEncode (ASP.NET 1.1 and 2.0)" column, for historical curiosity.
      • In ASP.NET 4.x, the HttpUtility.HtmlEncode method just forwards to System.Web.Util.HttpEncoder.Current.HtmlEncode(s)
        • Note that System.Web.Util.HttpEncoder.**Current** can be replaced at runtime, which is how an update to ASP.NET 4.x (I forget which) was able to make almost everyone use (the then far-better) AntiXssEncoder without people needing to change their existing application code. Neat.
        • Also note that System.Web.Util.HttpEncoder.**Current** can point to any compatible implementation, while System.Web.Util.HttpEncoder.**Default**`` is _always_ just a wrapper over WebUtility.HtmlEncode`.
    • System.Web.Util.HttpEncoder.Default - as mentioned above, this is just another System.Net.WebUtility wrapper.
# Input Code-point(s) UTF-8 bytes UTF-16 bytes System.Net.WebUtility.HtmlEncode System.Text.Encodings.Web.HtmlEncoder System.Web.Security.AntiXss.AntiXssEncoder.HtmlEncode(false) System.Web.Security.AntiXss.AntiXssEncoder.HtmlEncode(true)
0 abc U+0061 U+0062 U+0063 61 62 63 61 00 62 00 63 00 abc abc abc abc
1 < U+003C 3C 3C 00 &lt; &lt; &lt; &lt;
2 > U+003E 3E 3E 00 &gt; &gt; &gt; &gt;
3 & U+0026 26 26 00 &amp; &amp; &amp; &amp;
4 " U+0022 22 22 00 &quot; &quot; &quot; &quot;
5 ' U+0027 27 27 00 &#39; &#x27; &#39; &#39;
6 Ÿ U+009F C2 9F 9F 00 Ÿ &#x9F; &#159; &#159;
7 U+00A0 C2 A0 A0 00 &#160; &#xA0; &#160; &nbsp;
8 ÿ U+00FF C3 BF FF 00 &#255; &#xFF; ÿ &yuml;
9 ā U+0101 C4 81 01 01 ā &#x101; ā ā
10 ~ U+007E 7E 7E 00 ~ ~ ~ ~
11 | `U+007F` | `7F` | `7F 00` | &#x7F; &#127; &#127;
12 £ U+00A3 C2 A3 A3 00 &#163; &#xA3; £ &pound;
13 ÿ U+00FF C3 BF FF 00 &#255; &#xFF; ÿ &yuml;
14 U+1E02 E1 B8 82 02 1E &#x1E02; &#7682; &#7682;
15 💩 U+1F4A9 F0 9F 92 A9 3D D8 A9 DC &#128169; &#x1F4A9; &#128169; &#128169;
16 𣎴 U+233B4 F0 A3 8E B4 4C D8 B4 DF &#144308; &#x233B4; &#144308; &#144308;
17 𣎴 U+233B4 F0 A3 8E B4 4C D8 B4 DF &#144308; &#x233B4; &#144308; &#144308;
18 لك أن كلا U+0644 U+0643 U+0020 U+0623 U+0646 U+0020 U+0643 U+0644 U+0627 D9 84 D9 83 20 D8 A3 D9 86 20 D9 83 D9 84 D8 A7 44 06 43 06 20 00 23 06 46 06 20 00 43 06 44 06 27 06 لك أن كلا &#x644;&#x643; &#x623;&#x646; &#x643;&#x644;&#x627; &#1604;&#1603; &#1571;&#1606; &#1603;&#1604;&#1575; &#1604;&#1603; &#1571;&#1606; &#1603;&#1604;&#1575;

Obsolete and historical HtmlEncode methods:

This table is included only for computer-archeological reasons. **It does not apply to .NET Framework 4.8, nor any versions of ASP.NET Core and ASP.NET-for-.NET 5 or later.

# Input Code-point(s) UTF-8 bytes UTF-16 bytes System.Web.HttpUtility.HtmlEncode (ASP.NET 1.1 and 2.0) Microsoft.Security.Application.Encoder.HtmlEncode(false) Microsoft.Security.Application.Encoder.HtmlEncode(true)
0 abc U+0061 U+0062 U+0063 61 62 63 61 00 62 00 63 00 abc abc abc
1 < U+003C 3C 3C 00 &lt; &lt; &lt;
2 > U+003E 3E 3E 00 &gt; &gt; &gt;
3 & U+0026 26 26 00 &amp; &amp; &amp;
4 " U+0022 22 22 00 &quot; &quot; &quot;
5 ' U+0027 27 27 00 ' &#39; &#39;
6 Ÿ U+009F C2 9F 9F 00 Ÿ &#159; &#159;
7 U+00A0 C2 A0 A0 00 &#160; &#160; &nbsp;
8 ÿ U+00FF C3 BF FF 00 &#255; ÿ &yuml;
9 ā U+0101 C4 81 01 01 ā ā ā
10 ~ U+007E 7E 7E 00 ~ ~ ~
11 | `U+007F` | `7F` | `7F 00` | &#127; &#127;
12 £ U+00A3 C2 A3 A3 00 &#163; £ &pound;
13 ÿ U+00FF C3 BF FF 00 &#255; ÿ &yuml;
14 U+1E02 E1 B8 82 02 1E &#7682; &#7682;
15 💩 U+1F4A9 F0 9F 92 A9 3D D8 A9 DC 💩 &#128169; &#128169;
16 𣎴 U+233B4 F0 A3 8E B4 4C D8 B4 DF 𣎴 &#144308; &#144308;
17 𣎴 U+233B4 F0 A3 8E B4 4C D8 B4 DF 𣎴 &#144308; &#144308;
18 لك أن كلا U+0644 U+0643 U+0020 U+0623 U+0646 U+0020 U+0643 U+0644 U+0627 D9 84 D9 83 20 D8 A3 D9 86 20 D9 83 D9 84 D8 A7 44 06 43 06 20 00 23 06 46 06 20 00 43 06 44 06 27 06 لك أن كلا &#1604;&#1603; &#1571;&#1606; &#1603;&#1604;&#1575; &#1604;&#1603; &#1571;&#1606; &#1603;&#1604;&#1575;

HtmlEncode methods in .NET 5

# Input Code-point / Runes UTF-8 bytes UTF-16 bytes System.Net.WebUtility.HtmlEncode System.Web.HttpUtility.HtmlEncode (.NET 5) System.Text.Encodings.Web.HtmlEncoder
0 abc 97 98 99 61 62 63 61 00 62 00 63 00 abc abc abc
1 < 60 3C 3C 00 &lt; &lt; &lt;
2 > 62 3E 3E 00 &gt; &gt; &gt;
3 & 38 26 26 00 &amp; &amp; &amp;
4 " 34 22 22 00 &quot; &quot; &quot;
5 ' 39 27 27 00 &#39; &#39; &#x27;
6 Ÿ 159 C2 9F 9F 00 Ÿ Ÿ &#x9F;
7 160 C2 A0 A0 00 &#160; &#160; &#xA0;
8 ÿ 255 C3 BF FF 00 &#255; &#255; &#xFF;
9 ā 257 C4 81 01 01 ā ā &#x101;
10 ~ 126 7E 7E 00 ~ ~ ~
11 | `127` | `7F` | `7F 00` | `` &#x7F;
12 £ 163 C2 A3 A3 00 &#163; &#163; &#xA3;
13 ÿ 255 C3 BF FF 00 &#255; &#255; &#xFF;
14 7682 E1 B8 82 02 1E &#x1E02;
15 💩 128169 F0 9F 92 A9 3D D8 A9 DC &#128169; &#128169; &#x1F4A9;
16 𣎴 144308 F0 A3 8E B4 4C D8 B4 DF &#144308; &#144308; &#x233B4;
17 𣎴 144308 F0 A3 8E B4 4C D8 B4 DF &#144308; &#144308; &#x233B4;
18 لك أن كلا 1604 1603 32 1571 1606 32 1603 1604 1575 D9 84 D9 83 20 D8 A3 D9 86 20 D9 83 D9 84 D8 A7 44 06 43 06 20 00 23 06 46 06 20 00 43 06 44 06 27 06 لك أن كلا لك أن كلا &#x644;&#x643; &#x623;&#x646; &#x643;&#x644;&#x627;

Upvotes: 10

Related Questions