rxmnnxfpvg
rxmnnxfpvg

Reputation: 30993

How to exclude a character from a regex group?

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?

re.compile('[\W_]')

Thanks.

Upvotes: 33

Views: 92964

Answers (2)

eldarerathis
eldarerathis

Reputation: 36193

You could just use a negated character class instead:

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want).


Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:

re.compile(r"[^a-zA-Z0-9-]+")

The + will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.

Upvotes: 44

Ned Batchelder
Ned Batchelder

Reputation: 375484

\w matches alphanumerics, add in the hyphen, then negate the entire set: r"[^\w-]"

Upvotes: 9

Related Questions