user34537
user34537

Reputation:

How do i use 32 bit unicode characters in C#?

Maybe i dont need 32bit strings but i need to represent 32bit characters

http://www.fileformat.info/info/unicode/char/1f4a9/index.htm Now i grabbed the symbola font and can see the character when i paste it (in the url or any text areas) so i know i have the font support for it.

But how do i support it in my C#/.NET app?

-edit- i'll add something. When i pasted the said character in my .NET winform app i DO NOT see the character correctly. When pasting it into firefox i do see it correctly. How do i see the characters correctly in my winform apps?

Upvotes: 4

Views: 7079

Answers (3)

Josh Gallagher
Josh Gallagher

Reputation: 5329

If the question is actually,

How do I put the 'pile of poo' emoji, U+1F4A9, into a C# string literal, given that it needs 32 bits to represent in a UTF-16 code page?

then the answer is:

"\U0001F4A9"

In the C# Interactive window in Visual Studio this shows the following output:

Screenshot of C# interactive window in Visual Studio showing that the escape sequence will print out a single pile of poo emoji, đź’©, when evaluated.

Note the use of the upper case \U escape code. This must be followed by exactly eight hexadecimal digits, unlike \u, which must be followed by exactly four hexadecimal digits. See Unicode Character Escape Sequences in the C# language reference.

Also note that "\U0001F4A9".Length evaluates to 2 because Length always returns the number of sixteen bit characters in the string, not the number of Unicode characters in the string.

When the string is printed out, you should see only one character as long as encoding translation has been performed correctly along the way.

Note, U+1F4A9 was the example emoji linked in the OP's question.

Upvotes: 0

Mac
Mac

Reputation: 8339

I am not sure I understand your question:

  • Strings in .NET are UTF-16 encoded, and there is nothing you can do about this. If you want to get the UTF-32 version of a string, you will have to convert it into a byte array with the UTF32Encoding class.
  • Characters in .NET are thus 16 bits long, and there is nothing you can do about this either. A UTF-32 encoded character can only be represented by a byte array (with 4 items). You can use the UTF32Encoding class for this purpose.
  • Every UTF-32 character has an equivalent UTF-16 representation, and vice-versa. So in this context we could only speak of characters, and of their different representations (encodings), UTF-16 being the representation of choice on the .NET platform.

Upvotes: 9

svick
svick

Reputation: 244988

You didn't say what exactly do you mean by “support”. But there is nothing special you need to do to to work with characters that don't fit into one 16-bit char, unless you do string manipulation. They will just be represented as surrogate pairs, but you shouldn't need to know about that if you treat the string as a whole.

One exception is that some string manipulation methods won't work correctly. For example "\U0001F4A9".Substring(1) will return the second half of the surrogate pair, which is not a valid string.

Upvotes: 3

Related Questions