Reputation: 30993
I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?
re.compile('[\W_]')
Thanks.
Upvotes: 33
Views: 92964
Reputation: 36193
You could just use a negated character class instead:
re.compile(r"[^a-zA-Z0-9-]")
This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.
>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'
Notice that this also replaces spaces (which may certainly be what you want).
Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:
re.compile(r"[^a-zA-Z0-9-]+")
The +
will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.
Upvotes: 44
Reputation: 375484
\w
matches alphanumerics, add in the hyphen, then negate the entire set: r"[^\w-]"
Upvotes: 9