Matthew Romero
Matthew Romero

Reputation: 63

How to convert string with Unicode literal characters in it to a Unicode string

I am receiving data from an API (through C# code) in its literal form. Some of this data has non-ASCII characters in it. One example is shown below:

string universityName = "Universidad de M\u00e1laga";

I will be inserting this data into a SQL Server database, and would like to insert the Unicode encoded version, not the literal version. To do so, I need to encode the string correctly before inserting it. It should look like:

Universidad de Málaga

I've looked around Stack Overflow but can't seem to find a related question, so I thought I'd ask. Is there a built-in C# library that allows me to give it the original string and have it return the desired string? If not, is there a process I should follow?

I've already tried using Encoding.Unicode.GetBytes to get the bytes of the string and then convert it back into a string, but it doesn't seem to work for me. I could be using it wrong too.

Upvotes: 6

Views: 2762

Answers (2)

TheGeneral
TheGeneral

Reputation: 81493

There are a number of ways to do this, however this might work for you.

Disclaimer: it's assumed your string looks like this in your db, Universidad de M\u00e1laga

var test1 = "Universidad de M\\u00e1laga";  
var test2 = Regex.Unescape(test1);
Console.WriteLine(test1);
Console.WriteLine(test2);

Output

Universidad de M\u00e1laga
Universidad de Málaga

Note : This maybe pointing to an overall structural or design problem with this entire situation. Though, who knows what APIs give you back

Full demo here

Upvotes: 4

Theodor Zoulias
Theodor Zoulias

Reputation: 43464

The string you are showing contains a Unicode character escape sequence, which is a way for encoding characters inside C# strings, and it is used mainly for non-printable characters, but can be used for any character. For example all strings bellow are equal:

"ab"
"\u0061b"
"a\u0062"
"\u0061\u0062"

You can confirm it like this:

Console.WriteLine("ab" == "\u0061b"); // True
Console.WriteLine("ab" == "a\u0062"); // True
Console.WriteLine("ab" == "\u0061\u0062"); // True

In your case:

Console.WriteLine("M\u00e1laga" == "Málaga"); // True

Long story short, you don't have to do anything. Your string is perfectly fine. Just store it in the DB!

Upvotes: 0

Related Questions