user2388013
user2388013

Reputation: 881

Decoding a special character in C#

I am wondering how I could decode the special character • to HTML?

I have tried using System.Web.HttpUtility.HtmlDecode but not luck yet.

Upvotes: 5

Views: 2683

Answers (2)

drf
drf

Reputation: 8699

The issue here is not HTML decoding, but rather that the text was encoded in one character set (e.g., windows-1252) and then encoded again as a second (UTF-8).

In UTF-8, is decoded as E2 80 A2. When this byte sequence is read using windows-1252 encoding, E2 80 A2 encodes as •. (Saved again as UTF-8 • becomes C3 A2 E2 82 AC C2 A2 20 54 65 73 74.)

If the file is a windows-1252-encoded file, the file can simply be read with the correct encoding (e.g., as an argument to a StreamReader constructor.):

new StreamReader(..., Encoding.GetEncoding("windows-1252"));

If the file was saved with an incorrect encoding, the encoding can be reversed in some cases. For instance, for the string sequence in your question, you can write:

string s = "•"; // the string sequence that is not properly encoded
var b = Encoding.GetEncoding("windows-1252").GetBytes(s); // b = `E2 80 A2`
string c = Encoding.UTF8.GetString(b);  // c = `•`

Note that many common nonprinting characters are in the range U+2000 to U+2044 (Reference), such as "smart quotes", bullets, and dashes. Thus, the sequence �, where ? is any character, will typically signify this type of encoding error. This allows this type of error to be corrected more broadly:

static string CorrectText(string input)
{
    var winencoding = Encoding.GetEncoding("windows-1252");
    return Regex.Replace(input, "â€.",
        m => Encoding.UTF8.GetString(winencoding.GetBytes(m.Value)));
}

Calling this function with text malformed in this way will correct some (but not all) errors. For instance CorrectText("•Test–or“") will return the intended •Test–or“.

Upvotes: 5

Tom F
Tom F

Reputation: 817

HtmlDecode is for converting Html-encoded strings into a readable string format. Perhaps HtmlEncode might be what you're actually looking for.

Upvotes: 2

Related Questions