Reputation: 1260
Why does this line
System.Text.Encoding.UTF8.GetBytes("ABCD±ABCD")
Give me back 10 bytes instead of 9? Although ± is char(177)
Is there a .Net function / encoding that will translate this string correctly into 9 bytes?
Upvotes: 2
Views: 5916
Reputation: 116138
You should use Windows-1251
encoding to get ±
as 177
var bytes = System.Text.Encoding.GetEncoding("Windows-1251").GetBytes("ABCD±ABCD");
Upvotes: 6
Reputation: 190943
±
falls out side of the range of ASCII so it is represented by 2 bytes.
Upvotes: 2
Reputation: 1062865
Although ± is char(177)
And the UTF-8 encoding for that is 0xc2 0xb1 - two bytes. Basically, every code-point >= 128 will take multiple bytes - where the number of bytes depends on the magnitude of the code-point.
That data is 10 bytes, when encoded with UTF-8. The error here is your expectation that it should take 9.
Upvotes: 8