klados
klados

Reputation: 787

UTF-8 raw character? to normal string

I want to decode utf-8(or unicode) text to normal string.

for example, I want to convert "\uc778\uc0b0\uc544\uc5f0\uc2dc\uba58\ud2b8, \uce58\uba74\uc5f4\uad6c\uc804\uc0c9\uc81c" kind of string to readable text.

I struggled with system.text.utf8encoding text.encoding.utf8.getstring() but it's not working...

How can I solve the problem? It seems that the solution would be simple... If possible, it would be great if you write the code in VB.Net

Thank you for your advice!


Thanks for replying.

I think I didn't write my point clearly.

The question is that I want to convert "\uc885\ud569\uc9c4\ub8cc\uc2e4 \uacac\ud559 / \uce58\uacfc\uc758\uc0ac\uc724\ub9ac \ud1a0\ub860" (unicode 'code', not 'chracter') to a readable string, for example, "가나다라". or chinese or whatever.

and, I need the .NET code to do that.

tried

theString = Convert.toString("\uc885\ud569");

tried

Dim utf8Encoding As New System.Text.UTF8Encoding
Dim encodedString() As Byte
encodedString = utf8Encoding.GetBytes(encodedString) .....

and a few more, but nothing converts "\uc885\ud569" to "가나". (that's an example. I got that each '\u????' code matches a single character, for ex '가')

Thank you!

Upvotes: 0

Views: 1570

Answers (2)

svick
svick

Reputation: 245066

I think I finally understand what the problem is. A string like "\uc778\uc0b0" is exactly the same as "인산" in C# (and it's UTF-16, not UTF-8). But VB.NET doesn't understand such escape sequences.

I think the best option here would be to write the Koren characters directly, something like "인산" is valid VB.NET code.

If you really need to use C#-like escape sequences, you can use Regex.Unescape():

Dim escaped = "\uc778\uc0b0\uc544\uc5f0\uc2dc\uba58\ud2b8, \uce58\uba74\uc5f4\uad6c\uc804\uc0c9\uc81c"
Dim unescaped = Regex.Unescape(escaped)

Upvotes: 1

scartag
scartag

Reputation: 17680

You don't have to do anything to convert it.

The text is in chinese characters (or similar asian characters)

Simply output it i guess. worked for me.

I simply did a Console.WriteLine() from linqpad.

Each of the \uXXXX is a unicode value for a specific character.

Upvotes: 1

Related Questions