Reputation: 21
I typed this into the nodejs console
new Buffer(new Buffer([0xde]).toString('utf8'), 'utf8')
and it prints out
<Buffer ef bf bd>
After looking at the docs it seems that this would produce an identical buffer. I'm creating a utf8 encoded string from a buffer whose contents consist of one byte (0xde) then using that utf8 encoded string to create a buffer. Am I missing something here?
Upvotes: 2
Views: 2750
Reputation: 106746
For encodings that can be multi-byte, you cannot expect to get the same bytes back that you started with in all cases. In the case of UTF-8, some characters require more than one byte to be represented properly.
In your example, 0xde
exceeds 0x7f
which is the largest value for a character that can be represented by a single byte. So when you then call .toString('utf8')
, node sees that it only has one byte and instead returns the UTF-8 character \uFFFD
(0xef, 0xbf, 0xbd
in hex) which is used to denote an unknown/unrepresentable character. Then reading back in this "replacement character" value back into a new Buffer is no problem, as it is a valid UTF-8 character.
Upvotes: 4