Reputation: 449
I am trying to convert a string to bytes and vice versa..i have seen the previous question of converting string to byte array on this site..but my problem is something else
Here is my code
byte[] btest = new byte[2];
btest[0] = 0xFF;
btest[1] = 0xAA;
UTF8Encoding enc = new UTF8Encoding();
string str = enc.GetString(btest); //here i get a string with values str = '��'
//I had a byte array of size 2 with the above contents
//Here i am trying to convert the string to byte array
byte [] bst = enc.GetBytes(str); //On this step i get a byte array of size 6
//and bst array contents as {239,191,189,239,191,189}
//In this step i try to convert the value back to btest array by taking the index
btest[0] = Convert.ToByte(str[0]); //on this line i get an exception
//Exception : Value was either too large or too small for an unsigned byte.
btest[1] = Convert.ToByte(str[1]);
Shouldn't the GetBytes return me a byte array of size 2,what wrong am i doing?? I want bst[0] to contain the same value which i have assigned to btest[0] .
Thanks
Upvotes: 0
Views: 1928
Reputation: 48496
Your original byte input is not valid UTF-8 (see here), in that it doesn't represent any unicode code point. As a result the invalid data is converted to �. In the end, that is a character like any other, so if you try to convert that back to bytes, it won't generate your initial wrong byte sequence, but the proper byte sequence to represent that unicode code point (twice).
The character cannot be represented as a single byte, hence Convert.ToByte
throws an OverflowException
.
If you were to change your original input to a valid byte sequence, say:
btest[0] = 0xDF;
btest[1] = 0xBF;
You will see that the enc.GetBytes(str)
call actually results in a two-byte array again.
Upvotes: 1
Reputation: 255045
Character with codepoint 0xFF 0xAA
is invalid in UTF-8 encoding, thus it's converted to �
References:
Upvotes: 0