Reputation: 10122
Using Python module re, how to get the equivalent of the "\w" (which matches alphanumeric chars) WITHOUT matching the numeric characters (those which can be matched by "[0-9]")?
Notice that the basic need is to match any character (including all unicode variation) without numerical chars (which are matched by "[0-9]").
As a final note, I really need a regexp as it is part of a greater regexp.
Underscores should not be matched.
EDIT:
Upvotes: 10
Views: 6631
Reputation: 13247
You want [^\W\d]
: the group of characters that is not (either a digit or not an alphanumeric). Add an underscore in that negated set if you don't want them either.
A bit twisted, if you ask me, but it works. Should be faster than the lookahead alternative.
Upvotes: 38
Reputation: 338238
(?!\d)\w
A position that is not followed by a digit, and then \w
. Effectively cancels out digits but allows the \w
range by using a negative look-ahead.
The same could be expressed as a positive look-ahead and \D
:
(?=\D)\w
To match multiple of these, enclose in parens:
(?:(?!\d)\w)+
Upvotes: 9