Reputation: 449

Encoding issue with String to Bytes conversion

I am trying to convert a string to bytes and vice versa..i have seen the previous question of converting string to byte array on this site..but my problem is something else

Here is my code

byte[] btest = new byte[2];
btest[0] = 0xFF;
btest[1] = 0xAA;
UTF8Encoding enc = new UTF8Encoding();
string str = enc.GetString(btest); //here i get a string with values str = '��'

//I had a byte array of size 2 with the above contents
//Here i am trying to convert the string to byte array
byte [] bst = enc.GetBytes(str); //On this step i get a byte array of size 6 
//and bst array contents as {239,191,189,239,191,189}

//In this step i try to convert the value back to btest array by taking the index
btest[0] = Convert.ToByte(str[0]); //on this line i get an exception
//Exception : Value was either too large or too small for an unsigned byte.
btest[1] = Convert.ToByte(str[1]);

Shouldn't the GetBytes return me a byte array of size 2,what wrong am i doing?? I want bst[0] to contain the same value which i have assigned to btest[0] .

Thanks

Upvotes: 0

Answers (2)

Thorarin

Reputation: 48496

Your original byte input is not valid UTF-8 (see here), in that it doesn't represent any unicode code point. As a result the invalid data is converted to �. In the end, that is a character like any other, so if you try to convert that back to bytes, it won't generate your initial wrong byte sequence, but the proper byte sequence to represent that unicode code point (twice).

The character cannot be represented as a single byte, hence Convert.ToByte throws an OverflowException.

If you were to change your original input to a valid byte sequence, say:

btest[0] = 0xDF;
btest[1] = 0xBF;

You will see that the enc.GetBytes(str) call actually results in a two-byte array again.

Upvotes: 1

zerkms

Reputation: 255045

Character with codepoint 0xFF 0xAA is invalid in UTF-8 encoding, thus it's converted to �

References:

See valid codepoint ranges on a corresponding wikipedia page: http://en.wikipedia.org/wiki/UTF-8#Description

Upvotes: 0

Encoding issue with String to Bytes conversion

Answers (2)

Related Questions