M_K
M_K

Reputation: 3455

How to decode a Unicode character in a string

How do I decode this string 'Sch\u00f6nen' (@"Sch\u00f6nen") in C#, I've tried HttpUtility but it doesn't give me the results I need, which is "Schönen".

Upvotes: 42

Views: 41718

Answers (3)

Dmitrii Matunin
Dmitrii Matunin

Reputation: 747

Wrote a code that covnerts unicode strings to actual chars. (But the best answer in this topic works fine and less complex).

string stringWithUnicodeSymbols = @"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, @"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
    try
    {
        if (s.Length == 4)
        {
            var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
            outString += decoded;
        }
        else
        {
            outString += s;
        }
    }
    catch (Exception e)
    {
        outString += s;
    }
}

Upvotes: 1

findcaiyzh
findcaiyzh

Reputation: 647

If you landed on this question because you see "Sch\u00f6nen" (or similar \uXXXX values in string constant) - it is not encoding. It is a way to represent Unicode characters as escape sequence similar how string represents New Line by \n and Return by \r.

I don't think you have to decode.

string unicodestring = "Sch\u00f6nen";
Console.WriteLine(unicodestring);

Schönen was outputted.

Upvotes: 4

M_K
M_K

Reputation: 3455

Regex.Unescape did the trick:

System.Text.RegularExpressions.Regex.Unescape(@"Sch\u00f6nen");

Note that you need to be careful when testing your variants or writing unit tests: "Sch\u00f6nen" is already "Schönen". You need @ in front of string to treat \u00f6 as part of the string.

Upvotes: 93

Related Questions