Reputation: 1024
If a string contains character from interval U+D800..U+DFFF
then encodeURIComponent()
throws a malformed URI sequence
error. I would like to eliminate those characters from a given string before passing it to encodeURIComponent()
. How to do that?
Example: I have a textfile encoded in UTF-16BE which contains the following hexa chars:
D7FF D800 D801 ... DFFE DFFF E000
I'm searching for a function which returns this string from the string above:
D7FF E000
So only valid Unicode characters remain.
Upvotes: 2
Views: 150
Reputation: 31692
You can use a replace
/encodeURIComponent
combo to achieve the desired result. You first need to match all the characters that do not fall in the unicode range [0xD800..0xDFFF]
using this regex: /[^\uD800-\uDFFF]+/g
then replace them with their encoded versions:
let result = string.replace(/[^\uD800-\uDFFF]+/g, match => encodeURIComponent(match));
Example:
let string = "/foo/\uD7FF\uD800\uD801/bar";
let result = string.replace(/[^\uD800-\uDFFF]+/g, match => encodeURIComponent(match));
console.log(result);
Upvotes: 1