Reputation: 2528
Can anyone tell me what is going on here?
byte[] stamp = new byte[]{0,0,0,0,0,1,177,115};
string serialize = System.Text.Encoding.UTF8.GetString(stamp);
byte[] deserialize = System.Text.Encoding.UTF8.GetBytes(serialize);
//deserialize == byte[]{0,0,0,0,0,1,239,191,189,115}
Why is stamp != deserialize??
Upvotes: 4
Views: 316
Reputation: 1544
In your original byte array, you have the 177
character, which is the plusminus sign. However during the serialization, that code isn't being recognized. It's being replaced by 239 191 189
which is the REPLACEMENT CHARACTER.
Here's a chart for reference. http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280&utf8=dec
I'm not quite sure WHY the plusminus sign isn't recognized, but that's why the byte arrays aren't equal. Other than that swap, they would be equal and the data isn't corrupted in any way.
Upvotes: 5
Reputation: 111219
The array of bytes does not encode a valid text string in UTF-8, so when you "serialize" it the parts that can't be recognized are replaced by a "replacement character." If you must convert byte arrays into strings you should find an encoding that does not have restrictions like this, such as ISO-8859-1.
In particular, the byte 177 cannot appear on its own in valid UTF-8: bytes in range 128 - 191 are "continuation bytes" that can appear only after a byte in range 194-244 has been seen. You can read more about UTF-8 here: https://en.wikipedia.org/wiki/UTF-8
Upvotes: 4