Reputation: 1
convert.onclick =
function() {
for (var i = 0; i < before.value.length; i++) {
after.value += "'" + before.value.charAt(i) + "', ";
}
}
<textarea id="before" type="text" name="input" style="width:100%;">*π‘(π)-_=+π’βπ¨πππΌπ£βπ₯ππͺππ¦π</textarea><br />
<textarea id="after" cols="50" rows="10" name="output" style="width:100%;"></textarea>
<button id="convert" name="convert" type="button">convert</button>
Here's a simple code, and when I operate it, I get the following results.
Some letters have been converted successfully. But most Unicode characters are not displayed normally. How do I fix this problem?
Upvotes: 0
Views: 314
Reputation: 56770
That is because starting from a certain point in UTF-8, characters can have length > 1.
console.log("9".length);
console.log("π‘".length);
console.log("π‘".charAt(0));
console.log(String.fromCodePoint("π‘".codePointAt(0)));
To fix it, instead of charAt
use codePoint
and codePointAt
:
convert.onclick =
function() {
for (const char of before.value) {
after.value += `'${String.fromCodePoint(char.codePointAt(0))}'`;
}
}
<textarea id="before" type="text" name="input" style="width:100%;">*π‘(π)-_=+π’βπ¨πππΌπ£βπ₯ππͺππ¦π</textarea><br />
<textarea id="after" cols="50" rows="10" name="output" style="width:100%;"></textarea>
<button id="convert" name="convert" type="button">convert</button>
You can also do an index-based traversal, but that requires to increase the index varaible inside the loop, depending on the currently traversed character's length
:
convert.onclick =
function() {
for (let i = 0; i < before.value.length; ) {
after.value += `'${String.fromCodePoint(before.value.codePointAt(i))}'`;
i+= String.fromCodePoint(before.value.codePointAt(i)).length;
}
}
<textarea id="before" type="text" name="input" style="width:100%;">*π‘(π)-_=+π’βπ¨πππΌπ£βπ₯ππͺππ¦π</textarea><br />
<textarea id="after" cols="50" rows="10" name="output" style="width:100%;"></textarea>
<button id="convert" name="convert" type="button">convert</button>
Upvotes: 0
Reputation: 11060
What you're running into are called surrogate pairs. Some unicode characters are composed of two bytes instead of one, and if you separate them, they no longer display correctly.
If you can use ES6, iterating a string with the spread operator or for..of
syntax actually takes surrogate pairs into account and will give you correct results easier. Other answers show how to do this.
If you can't use ES6, MDN has an example of how to handle these with charAt
here. I'll use this code below.
function getWholeChar(str, i) {
var code = str.charCodeAt(i);
if (Number.isNaN(code)) return '';
if (code < 0xD800 || code > 0xDFFF) return str.charAt(i);
if (0xD800 <= code && code <= 0xDBFF) {
if (str.length <= (i + 1)) throw 'High surrogate without following low surrogate';
var next = str.charCodeAt(i + 1);
if (0xDC00 > next || next > 0xDFFF) throw 'High surrogate without following low surrogate';
return str.charAt(i) + str.charAt(i + 1);
}
if (i === 0) throw 'Low surrogate without preceding high surrogate';
var prev = str.charCodeAt(i - 1);
if (0xD800 > prev || prev > 0xDBFF) throw 'Low surrogate without preceding high surrogate';
return false;
}
convert.onclick =
function() {
for (var i = 0, chr; i < before.value.length; i++) {
if(!(chr = getWholeChar(before.value, i))) continue;
after.value += "'" + chr + "', ";
}
}
<textarea id="before" type="text" name="input" style="width:100%;">*π‘(π)-_=+π’βπ¨πππΌπ£βπ₯ππͺππ¦π</textarea><br />
<textarea id="after" cols="50" rows="10" name="output" style="width:100%;"></textarea>
<button id="convert" name="convert" type="button">convert</button>
Upvotes: 2
Reputation: 8060
You can use spread operator (...
) to create array of unicode characters
convert.onclick = function () {
after.value = [...before.value].map(s => `'${s}'`).join(",");
};
<textarea id="before" type="text" name="input" style="width:100%;">*π‘(π)-_=+π’βπ¨πππΌπ£βπ₯ππͺππ¦π</textarea><br />
<textarea id="after" cols="50" rows="10" name="output" style="width:100%;"></textarea>
<button id="convert" name="convert" type="button">convert</button>
Upvotes: 1