Reputation: 4604
For example, const words = 'a̋b̋';
, the words.length
is 4
. But we are expecting 2
for "real" length.
Or, is there any safe way to go through all the characters from this above words
?
Upvotes: 5
Views: 2281
Reputation: 1075567
There's nothing built into JavaScript that will help you differentiate those combining marks from other characters. You could build something, of course, using the reference information from http://unicode.org. :-)
...but at least one person seems to have already done so for you: https://github.com/orling/grapheme-splitter
Enter the grapheme-splitter.js library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation of the Unicode UAX-29 standard.
const words = 'a̋b̋';
const splitter = new GraphemeSplitter();
const graphemes = splitter.splitGraphemes(words);
console.log(graphemes);
That results in two entries in graphemes
, "a̋"
and "b̋"
. (Can't do live example, live links to github raw pages are disallowed.)
Upvotes: 3