pimvdb
pimvdb

Reputation: 154818

Node.js buffer encoding issue

I'm having trouble understanding character encoding in node.js. I'm transmitting data and for some reason the encoding causes certain characters to be replaced with other ones. What I'm doing is base 64 encoding at the client side and decoding it in node.js.

To simplify, I narrowed it down to this piece of code which fails:

new Buffer("1w==", 'base64').toString('utf8');

The 1w== is the base 64 encoding of the × character. Now, when passing this string with the 'base64' argument to a buffer and then doing .toString('utf8') I expected to get the same character back, but I didn't. Instead I got (character code 65533).

Is the encoding utf8 wrong? If so, what should I use instead? If not, how can I decode a base 64 string in node.js?

Upvotes: 5

Views: 6326

Answers (2)

user1089933
user1089933

Reputation: 97

echo -n x | base64

gives

eA==

The given code would give the expected answer if the encoding were correct. The problem is likely on the encoding side. (1w== translates to the byte 0xD7 which would be the start of a multi-byte UTF-8 character)

Upvotes: 0

Roland Illig
Roland Illig

Reputation: 41617

No, your assumption is wrong. The base64-encoded string obviously has only one byte encoded. And all Unicode code points above U+007F need at least two bytes for being encoded in UTF-8.

I'm still not good at decoding base64 in mind, but try ISO-8859-1 instead.

The point is, base64 decoding transforms a character string to a byte string. You assumed that it decodes to a character string, but this is wrong. You still need to encode the byte string to a character string, and in your case the correct encoding is ISO-8859-1.

Upvotes: 4

Related Questions