Hyzups
Hyzups

Reputation: 181

How to unescape unicode string in C#

I have a Unicode string from a text file such that. And I want to display the real character.

For example:

\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b

When read this string from text file, using StreamReader.ReadToLine(), it escape the \ to '\\' such as "\\u8ba1", which is not wanted.

It will display the Unicode string same as from text. Which I want is to display the real character.

  1. How can change the "\\u8ba1" to "\u8ba1" in the result string.
  2. Or should use another Reader to read the string?

Upvotes: 18

Views: 11088

Answers (3)

Dmytro
Dmytro

Reputation: 191

If you are utilizing Newtosoft.JSON the solution is quite simple:

var s = @"\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";
s = Newtonsoft.Json.JsonConvert.DeserializeObject<string>("\"" + s + "\"");

Upvotes: 0

rraallvv
rraallvv

Reputation: 2933

This question came out in the first result when googling, but I thought there should be a simpler way... this is what I ended up using:

using System.Text.RegularExpressions;

//...

var str = "Ingl\\u00e9s";
var converted = Regex.Unescape(str);
Console.WriteLine($"{converted} {str != converted}"); // Inglés True

Upvotes: 5

dtb
dtb

Reputation: 217351

If you have a string like

var input1 = "\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";

// input1 == "计算机•网络•技术类"

you don't need to unescape anything. It's just the string literal that contains the escape sequences, not the string itself.


If you have a string like

var input2 = @"\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";

you can unescape it using the following regex:

var result = Regex.Replace(
    input2,
    @"\\[Uu]([0-9A-Fa-f]{4})",
    m => char.ToString(
        (char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));

// result == "计算机•网络•技术类"

Upvotes: 28

Related Questions