Reputation: 283043
Specifically, I want to match the range [#x10000-#xEFFFF]
. AFAIK, the \u
escape sequences only accept 4 hex digits, not 5. Is there a way to match higher values?
Upvotes: 2
Views: 763
Reputation: 868
5 hex digits require surrogate pairs. Use the es6 'u' (unicode) flag to create surrogate pair aware regex
under 'Ranges and flag āuā' https://javascript.info/regexp-character-sets-and-ranges
// incorrect
'š³'.match(/[š³š“]/)
// correct
'š³'.match(/[š³š“]/u
Upvotes: 2
Reputation: 234847
Internally, JavaScript uses UCS-2, which is limited to the base plane. For higher-range characters, you will have to use surrogate pairs. For instance, to find U+13FFA, you can match \uD80F\uDFFA
.
More details can be found here.
Unfortunately, this doesn't work well within character classes in a regex. With BMP characters, you can do things like /[a-z]/
. You can't do that with higher-range characters because JavaScript doesn't understand that surrogate pairs should be treated as a unit. You may be able to hunt around for third-party libraries that deal with this. Sadly, I don't know of any to recommend. This one might be worth a look. I've never used it, so I cannot attest to it's quality.
P.S. You may find this shim useful for dealing with higher-order characters generally.
Upvotes: 4
Reputation: 1134
Maybe something like this?
var regex = /#x[1-9a-eA-E][0-9a-fA-F]{4}/g;
console.log(regex.test("#x03FFA")); // false
console.log(regex.test("#x13FFA")); // true
mz
Upvotes: 0