RB.
RB.

Reputation: 37232

How to "HTML encode" Em Dash in Visual Basic.NET

I am generating some text to be shown on a web-site, and use HttpUtility.HtmlEncode to ensure it will look correct. However, this method does not appear to encode the Em Dash (it should convert it to "—").

I have come up with a solution, but I'm sure there is a better way of doing it - some library function or something.

sWebsiteText = _
    "<![CDATA[" & _
    HttpUtility.HtmlEncode(sSomeText) & _
    "]]>"

'This is the bit which seems "hacky"'
sWebsiteText = _
    sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

So my question is - how would you implement the "hacky" part?

Many thanks,

RB.

Upvotes: 0

Views: 10763

Answers (3)

Frederic
Frederic

Reputation:

Bobince's answer gives a solution to what seems to be your main concern : replacing your use of HtmlDecode by a more straightforward declaration of the char to replace.
Rewrite

sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

as

sWebsiteText.Replace("\u2013", "&#x2013;")

('\u2014' (dec 8212) is em dash, '\u2013' (dec 8211) is en dash.)
For readability purpose it may be considered better to use "&#x2013;" rather than "&#8211;", since the .Net declaration for the char ("\u2013") is in hex too. But, as decimal notation seems more common in html, I personaly would prefer using "&#8211;".
For reuse purpose, you probably should write your own HtmlEncode function declared in a custom HttpUtility, in order to be able to call it from anywhere else in your site without duplicating it.
(Have something like (sorry I have written it in C#, forgetting your examples were in VB):

/// <summary>
/// Supplies some custom processing to some HttpUtility functions.
/// </summary>
public static class CustomHttpUtility
{
    /// <summary>
    /// Html encodes a string.
    /// </summary>
    /// <param name="input">string to be encoded.</param>
    /// <returns>A html encoded string.</returns>
    public static string HtmlEncode(string input)
    {
        if (intput == null)
            return null;
        StringBuilder encodedString = new StringBuilder(
            HttpUtility.HtmlEncode(input));
        encodedString.Replace("\u2013", "&#x2013;");
        // add over missing replacements here, as for &#8212;
        encodedString.Replace("\u2014", "&#x2014;");
        //...

        return encodedString.ToString();
    }
}

Then replace

sWebsiteText = _
    "<![CDATA[" & _
    HttpUtility.HtmlEncode(sSomeText) & _
    "]]>"
'This is the bit which seems "hacky"'
sWebsiteText = _
    sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

With:

sWebsiteText = _
    "<![CDATA[" & _
    CustomHttpUtility.HtmlEncode(sSomeText) & _
    "]]>"

)

Upvotes: 0

bobince
bobince

Reputation: 536775

As this character is not an ASCII character, how do I encode it?

It's not an ASCII character, but it is a Unicode character, U+2014. If your page output is going to be UTF-8, which in this day and age it really should be, you don't need to HTML-encode it, just output the character directly.

Are there other characters which are likely to give me problems.

What problems exactly is it giving you? If you can't output '—', you probably can't output any other non-ASCII Unicode character, which is thousands of them.

Replace "\u2014" with "& #x2014;" if you really must, but really with today's Unicode-aware tools there should be no need to go around replacing every non-ASCII Unicode character with markup.

Upvotes: 3

mouviciel
mouviciel

Reputation: 67919

Take a look at A List Apart, as I suggested in HTML Apostrophe question.

The em dash — is represented by &#8212;.

Upvotes: 0

Related Questions