Reputation: 1504
I want to get specific letters from an unicode string using index. However, it doesn't work as expected.
Example:
var handwriting = `𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟𝕬𝕭𝕮𝕯𝕰𝕱𝕲𝕳𝕴𝕵𝕶𝕷𝕸𝕹𝕺𝕻𝕼𝕽𝕾𝕿𝖀𝖁𝖂𝖃𝖄𝖅1234567890`
var normal = `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`
console.log(normal[3]) // gives 'd' but
console.log(handwriting[3]) // gives '�' instead of '𝖉'
also length doesn't work as expected normal.length
gives correct value as 62 but handwriting.length
gives 114.
Indexing doesn't work as expected. How can I access the elements of unicode array?
I tried this on python it works perfectly but in Javascript it is not working.
I need exact characters from the unicode string like an expected output of 'd' '𝖉' for index 3
Upvotes: 0
Views: 273
Reputation: 35222
In Javascript, a string is a sequence of 16-bit code points. Since these characters are encoded above the Basic Multilingual Plane, it means that they are represented by a pair of code points, also known as a surrogate pair.
Unicode number of 𝖆
is U+1D586
. And 0x1D586 is greater than 0xFFFF (2^16). So, 𝖆
is represented by a pair of code points, also known as a surrogate pair
console.log("𝖆".length)
console.log("𝖆" === "\uD835\uDD86")
One way is to create an array of characters using the spread syntax or Array.from()
and then get the index you need
var handwriting = `𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟𝕬𝕭𝕮𝕯𝕰𝕱𝕲𝕳𝕴𝕵𝕶𝕷𝕸𝕹𝕺𝕻𝕼𝕽𝕾𝕿𝖀𝖁𝖂𝖃𝖄𝖅1234567890`
console.log([...handwriting][3])
console.log(Array.from(handwriting)[3])
Upvotes: 3
Reputation: 358
A unicode character looks like '\u00E9' so if your string is longer this is normal. To have the real length of a unicode string, you have to convert it to an array :
let charArray = [...handwriting]
console.log(charArray.length) //=62
Each item of your array is a char of your string. charArray[3] will return you the unicode char corresponding to '𝖉'
Upvotes: 2