Adam
Adam

Reputation: 5253

How to match all 4byte UTF-8 characters in JavaScript?

I've tried a lot of variations, like /[\u0FFF-\uFFFF]/, but it never worked for me as I expected.

The reason why I ask is because the mysql version I use doesn't support these characters, and cuts strings when there is an emoticon or something like that. Updating the mysql for the new version is not a solution at the moment.

Upvotes: 2

Views: 2324

Answers (1)

robertklep
robertklep

Reputation: 203534

According to this, code points U+10000 to U+10FFFF are encoded with 4 bytes.

With a recent enough Node version (v6, perhaps v5 as well but I didn't test), you can use those in a regular expression like this (notice the u flag):

const str = 'hello world😈!';

console.log( /[\u{10000}-\u{10FFFF}]/u.test(str) );         // true
console.log( str.replace(/[\u{10000}-\u{10FFFF}]/gu, '') ); // `hello world!`

(more info here)

Upvotes: 6

Related Questions