Reputation: 13
let buf1 = Buffer.from("3", "utf8");
let buf2 = Buffer.from("Здравствуйте", "utf8");
// <Buffer 33>
// <Buffer d0 97 d0 b4 d1 80 d0 b0 d0 b2 d1 81 d1 82 d0 b2 d1 83 d0 b9 d1 82 d0 b5>
Why does char '3' encode to '33' in buf1 but 'd0 97' in buf2?
Upvotes: 1
Views: 133
Reputation: 881113
Because 3
is not З
, despite the similarity to the untrained eye. Look closer and you'll see the difference, however subtle.
The former is Unicode code point U+0033 - DIGIT THREE
(see here), while the latter is U+0417 - CYRILLIC CAPITAL LETTER ZE
(see here), encoded in UTF-8 as d0 97
.
The Russian word is actually hello, pronounced (very roughly, since I only know hello and goodbye, taught by a Russian girlfriend many decades ago) "Strasvoytza", with no "three" anywhere in the concept.
Upvotes: 2
Reputation: 199
The first character of the second buffer is the Cyrillic character "Ze" https://en.m.wikipedia.org/wiki/Ze_(Cyrillic) and not the Arabic numeral 3 https://en.m.wikipedia.org/wiki/3
Upvotes: 0