Reputation: 19628
I need to write a regular expression to get all the characters in the list below.. (remove all the characters not in the list)
allow_characters = "#.-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
I don't know how to do it, should I even use re.match or re.findall or re.sub...?
Thanks a lot in advance.
Upvotes: 0
Views: 89
Reputation: 208475
Don't use regular expressions at all, first convert allow_characters
to a set and then use ''.join()
with a generator expression that strips out the unwanted characters. Assuming the string you are transforming is called s
:
allow_char_set = set(allow_characters)
s = ''.join(c for c in s if c in allow_char_set)
That being said, here is how this might look with regex:
s = re.sub(r'[^#.\-_a-zA-Z0-9]+', '', s)
You could convert your allow_characters
string into this regex, but I think the first solution is significantly more straightforward.
Edit: As pointed out by DSM in comments, str.translate()
is often a very good way to do something like this. In this case it is slightly complicated but you can still use it like this:
import string
allow_characters = "#.-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
all_characters = string.maketrans('', '')
delete_characters = all_characters.translate(None, allow_characters)
s = s.translate(None, delete_characters)
Upvotes: 7