satyender
satyender

Reputation: 837

How to convert unicode to utf-8 encoding in c#

I want to convert unicode string to UTF8 string. I want to use this UTF8 string in SMS API to send unicode SMS. I want conversion like this tool https://cafewebmaster.com/online_tools/utf8_encode

eg. I have unicode string "हैलो फ़्रेंड्स" and it should be converted into "हà¥à¤²à¥ à¥à¥à¤°à¥à¤à¤¡à¥à¤¸"

I have tried this but not getting expected output

    private string UnicodeToUTF8(string strFrom)
        {
           byte[] bytes = Encoding.Default.GetBytes(strFrom);

           return Encoding.UTF8.GetString(bytes);

        }

and calling function like this

string myUTF8String = UnicodeToUTF8("हैलो फ़्रेंड्स");

Upvotes: 2

Views: 23804

Answers (2)

CodeWhore
CodeWhore

Reputation: 981

Try this:

string output = "hello world";
byte[] bytes1 = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(output));
byte[] bytes2 = Encoding.Convert(Encoding.Unicode, Encoding.Unicode, Encoding.Unicode.GetBytes(output));
var output1 = Encoding.UTF8.GetString(bytes1);
var output2 = Encoding.Unicode.GetString(bytes2);

You will see that bytes1 is 11 bytes (1 byte per char UTF-8) and bytes2 is 22 bytes (2 bytes per char for unicode)

Upvotes: 1

Kyle
Kyle

Reputation: 6684

I don't think this is possible to answer concretely without knowing more about the SMS API you want to use. The string type in C# is UTF-16. If you want a different encoding, it's given to you as a byte[] (because a string is UTF-16, always).

You could 'cast' that into a string by doing something like this:

static string UnicodeToUTF8(string from) {
    var bytes = Encoding.UTF8.GetBytes(from);
    return new string(bytes.Select(b => (char)b).ToArray());
}

As far as I can tell this yields the same output as the website you linked. However, without knowing what API you're handing this string off to, I can't guarantee that this will ultimately work.

The point of string is that we don't need to worry about its underlying encoding, but this casting operation is kind of a giant hack and makes no guarantees that string represents a well-formed string anymore.

If something expects a UTF-8 encoding, it should accept a byte[], not a string.

Upvotes: 8

Related Questions