Bercovici Adrian
Bercovici Adrian

Reputation: 9380

Encoding UTF8 char to byte table or formula

Hello i want to hard code the values of some utf 8 characters to bytes.

E.g: '$','-','+';

For '$' how is the byte value calculated from this :

     symbol      char              octal code point      binary code point  binary utf8
        $         U+0024              044                 010 0100       00100100   

What is the value from this columns that gets encoded to byte?

public class Constants{
   public const byte dollar= [value pick from where ?]   
   public const byte minus= [pick value from where?]
}

Which column from above should i look for to encode a byte?
Is there any formula between the char column value and the byte value?

Upvotes: 0

Views: 1181

Answers (2)

xanatos
xanatos

Reputation: 111950

For ASCII chars (so chars in the range 0-127), you can simply cast them

public const byte dollar = (byte)'?';

Otherwise:

public const byte dollar = 0x0024;

So the char column. Remove the U+ and add a 0x. Valid only for characters in the range 0x0000-0x007F.

Note that there is no difference in the compiled code: sharplab:

public const byte dollar = (byte)'$';
public const byte dollar2 = 0x0024;

gets compiled to:

.field public static literal uint8 dollar = uint8(36)
.field public static literal uint8 dollar2 = uint8(36)

With C# 7.0, if you hate the world and you want to obfuscate your code, you can:

public const byte dollar = 0b00100100;

(they added binary literals, 0b is the prefix)

Upvotes: 1

Patrick Hofman
Patrick Hofman

Reputation: 157146

The characters you refer to are not UTF-8 characters. So they are single-byte characters. (Note that UTF-8 only uses 2 bytes for characters outside the ASCII character set)

Since the above, you can just cast them:

public const byte dollar = (byte)'$';

If you would need a UTF-8 character in bytes, you should use:

public static readonly byte[] trademark = new byte[] { 194, 153 };

Or, more explicit, but also worst for performance:

public static readonly byte[] trademark = Encoding.UTF8.GetBytes("\u0099");

Upvotes: 1

Related Questions