G. Ramistella
G. Ramistella

Reputation: 1407

RegEx: Checking if string contains non whitelisted characters

I'd like to check if a string contains characters that are not in whitelist, if that is indeed true, the string must be discarded.

The whitelist is currently abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!?., and possibly others.

It is very important for me to load the whitelist from a string (like the one provided), because I might need to expand the whitelist later.

Upvotes: 1

Views: 887

Answers (2)

pault
pault

Reputation: 43524

You don't need regex for this. Just check to see if any of the characters are not in the whitelist:

whitelist_set = set(whitelist)
if any(c not in whitelist_set for c in my_string):
#discard

As @jpp mentioned in the comments, it's more efficient to first convert the whitelist to a set because lookups will be O(1) rather than O(n) for list.

Upvotes: 3

dawg
dawg

Reputation: 104024

You can use .translate to delete the characters in the whitelist and then test if you have any characters left:

>>> wl='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!?.,'
>>> tbl=str.maketrans({}.fromkeys(wl))
>>> tst=tgt.translate(tbl)
# If tst, there are non whitelist characters...

That is the Python 3 version of translate. Python 2 would be:

>>> tgt.translate(None, wl)
# same test...

You can also use set arithmetic:

>>> if(set(tgt)-set(wl)): #discard...

If you want to use a regex:

>>> re.search(r'[^'+wl+']',tgt)

Upvotes: 0

Related Questions