Reputation: 5625
according to this article:
Internally, JavaScript source code is treated as a sequence of UTF-16 code units.
And this IBM doc says that:
UTF-16 is based on 16-bit code units. Therefore, each character can be 16 bits (2 bytes) or 32 bits (4 bytes).
But I tested in Chrome's console that English letters are only taking 1 byte, not 2 or 4.
new Blob(['a']).size === 1
I wonder why that is the case? Am I missing something here?
Upvotes: 1
Views: 1623
Reputation: 111
Internally, JavaScript source code is treated as a sequence of UTF-16 code units.
Note that this is referring to source code, not String values. String values are referenced to also be UTF-16 later in the article:
When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.
The discrepancy here is actually in the Blob constructor. From MDN:
Note that strings here are encoded as UTF-8, unlike the usual JavaScript UTF-16 strings.
Upvotes: 7
Reputation: 24661
UTF has a varying character size.
a
has a size of 1 byte, but ą
for example has 2
console.log('a', new Blob(['a']).size)
console.log('ą', new Blob(['ą']).size)
Upvotes: -1