user3719454
user3719454

Reputation: 1024

Exclude characters from string before passing it to encodeURIComponent()

If a string contains character from interval U+D800..U+DFFF then encodeURIComponent() throws a malformed URI sequence error. I would like to eliminate those characters from a given string before passing it to encodeURIComponent(). How to do that?

Example: I have a textfile encoded in UTF-16BE which contains the following hexa chars:

D7FF D800 D801 ... DFFE DFFF E000

I'm searching for a function which returns this string from the string above:

D7FF E000

So only valid Unicode characters remain.

Upvotes: 2

Views: 150

Answers (1)

ibrahim mahrir
ibrahim mahrir

Reputation: 31692

You can use a replace/encodeURIComponent combo to achieve the desired result. You first need to match all the characters that do not fall in the unicode range [0xD800..0xDFFF] using this regex: /[^\uD800-\uDFFF]+/g then replace them with their encoded versions:

let result = string.replace(/[^\uD800-\uDFFF]+/g, match => encodeURIComponent(match));

Example:

let string = "/foo/\uD7FF\uD800\uD801/bar";

let result = string.replace(/[^\uD800-\uDFFF]+/g, match => encodeURIComponent(match));

console.log(result);

Upvotes: 1

Related Questions