Reputation: 1407
I'd like to check if a string contains characters that are not in whitelist, if that is indeed true, the string must be discarded.
The whitelist is currently abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!?.,
and possibly others.
It is very important for me to load the whitelist from a string (like the one provided), because I might need to expand the whitelist later.
Upvotes: 1
Views: 887
Reputation: 43524
You don't need regex for this. Just check to see if any of the characters are not in the whitelist:
whitelist_set = set(whitelist)
if any(c not in whitelist_set for c in my_string):
#discard
As @jpp mentioned in the comments, it's more efficient to first convert the whitelist to a set
because lookups will be O(1)
rather than O(n)
for list
.
Upvotes: 3
Reputation: 104024
You can use .translate
to delete the characters in the whitelist and then test if you have any characters left:
>>> wl='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!?.,'
>>> tbl=str.maketrans({}.fromkeys(wl))
>>> tst=tgt.translate(tbl)
# If tst, there are non whitelist characters...
That is the Python 3 version of translate. Python 2 would be:
>>> tgt.translate(None, wl)
# same test...
You can also use set arithmetic:
>>> if(set(tgt)-set(wl)): #discard...
If you want to use a regex:
>>> re.search(r'[^'+wl+']',tgt)
Upvotes: 0