K. R.
K. R.

Reputation: 1260

System.Text.Encoding.UTF8.GetBytes Extra Byte

Why does this line

System.Text.Encoding.UTF8.GetBytes("ABCD±ABCD")

Give me back 10 bytes instead of 9? Although ± is char(177)

Is there a .Net function / encoding that will translate this string correctly into 9 bytes?

Upvotes: 2

Views: 5916

Answers (3)

L.B
L.B

Reputation: 116138

You should use Windows-1251 encoding to get ± as 177

var bytes = System.Text.Encoding.GetEncoding("Windows-1251").GetBytes("ABCD±ABCD");

Upvotes: 6

Daniel A. White
Daniel A. White

Reputation: 190943

± falls out side of the range of ASCII so it is represented by 2 bytes.

Upvotes: 2

Marc Gravell
Marc Gravell

Reputation: 1062865

Although ± is char(177)

And the UTF-8 encoding for that is 0xc2 0xb1 - two bytes. Basically, every code-point >= 128 will take multiple bytes - where the number of bytes depends on the magnitude of the code-point.

That data is 10 bytes, when encoded with UTF-8. The error here is your expectation that it should take 9.

Upvotes: 8

Related Questions