OneXer
OneXer

Reputation: 355

Combine whitelist and blacklist in javascript regex expression

I am having problems constructing a regex that will allow the full range of UTF-8 characters with the exception of 2 characters: _ and ?

So the whitelist is: ^[\u0000-\uFFFF] and the blacklist is: ^[^_%]

I need to combine these into one expression.

I have tried the following code, but does not work the way I had hoped:

var input = "this%";
var patrn = /[^\u0000-\uFFFF&&[^_%]]/g;
if (input.match(patrn) == "" || input.match(patrn) == null) {
    return true;
} else {
    return false;
}

input: this%

actual output: true

desired output: false

Upvotes: 0

Views: 711

Answers (3)

Laurel
Laurel

Reputation: 6173

Underscore is \u005F and percent is \u0025. You can simply alter the range to exclude these two characters:

^[\u0000-\u0024\u0026-\u005E\u0060-\uFFFF]

This will be just as fast as the original regex.


But I don't think that you are going to get the result you really want this way. JS can only go up to \uFFFF, anything past that will be two characters technically.

According to here, the following code returns false:

/^.$/.test('💩')

You need to have a different way to see if you have characters outside that range. This answer gives the following code:

String.prototype.getCodePointLength= function() {
    return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1;
};

Simply put, if the number returned by that is not the same as the number returned by .length() you have a surrogate pair (and thus you should return false).

If your input passes that test, you can run it up against another regex to avoid all the characters between \u0000-\uFFFF that you want to avoid.

Upvotes: 1

ndnenkov
ndnenkov

Reputation: 36110

Use negative lookahead:

(?!_blacklist_)_whitelist_

In this case:

^(?:(?![_%])[\u0000-\uFFFF])*$

Upvotes: 1

Oriol
Oriol

Reputation: 288480

If I understand correctly, one of these should be enough:

/^[^_%]*$/.test(str);
!/[_%]/.test(str);

Upvotes: 1

Related Questions