Regex for word exclusion in python

Question

I have a regular expression '[\w_-]+' which allows alphanumberic character or underscore.

I have a set of words in a python list which I don't want to allow

listIgnore = ['summary', 'config']

What changes need to be made in the regex?

P.S: I am new to regex

korylprince · Accepted Answer

This question intrigued me, so I set about for an answer:

'^(?!summary)(?!config)[\w_-]+$'

Now this only works if you want to match the regex against a complete string:

>>> re.match('^(?!summary)(?!config)[\w_-]+$','config_test')
>>> (None)
>>> re.match('^(?!summary)(?!config)[\w_-]+$','confi_test')
>>> <_sre.SRE_Match object at 0x21d34a8>

So to use your list, just add in more (?!) for each word after ^ in your regex. These are called lookaheads. Here's some good info.

If you're trying to match within a string (i.e. without the ^ and $) then I'm not sure it's possible. For instance the regex will just pick a subset of the string that doesn't match. Example: ummary for summary.

Obviously the more exclusions you pick the more inefficient it will get. There's probably better ways to do it.

Regex for word exclusion in python

Answers (2)

Related Questions