Reputation: 355
I am having problems constructing a regex that will allow the full range of UTF-8 characters with the exception of 2 characters: _
and ?
So the whitelist is: ^[\u0000-\uFFFF]
and the blacklist is: ^[^_%]
I need to combine these into one expression.
I have tried the following code, but does not work the way I had hoped:
var input = "this%";
var patrn = /[^\u0000-\uFFFF&&[^_%]]/g;
if (input.match(patrn) == "" || input.match(patrn) == null) {
return true;
} else {
return false;
}
input: this%
actual output: true
desired output: false
Upvotes: 0
Views: 711
Reputation: 6173
Underscore is \u005F and percent is \u0025. You can simply alter the range to exclude these two characters:
^[\u0000-\u0024\u0026-\u005E\u0060-\uFFFF]
This will be just as fast as the original regex.
But I don't think that you are going to get the result you really want this way. JS can only go up to \uFFFF
, anything past that will be two characters technically.
According to here, the following code returns false:
/^.$/.test('💩')
You need to have a different way to see if you have characters outside that range. This answer gives the following code:
String.prototype.getCodePointLength= function() {
return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1;
};
Simply put, if the number returned by that is not the same as the number returned by .length()
you have a surrogate pair (and thus you should return false).
If your input passes that test, you can run it up against another regex to avoid all the characters between \u0000-\uFFFF
that you want to avoid.
Upvotes: 1
Reputation: 36110
Use negative lookahead:
(?!_blacklist_)_whitelist_
In this case:
^(?:(?![_%])[\u0000-\uFFFF])*$
Upvotes: 1
Reputation: 288480
If I understand correctly, one of these should be enough:
/^[^_%]*$/.test(str);
!/[_%]/.test(str);
Upvotes: 1