Reputation: 92924
I have a regular expression '[\w_-]+'
which allows alphanumberic character or underscore.
I have a set of words in a python list which I don't want to allow
listIgnore = ['summary', 'config']
What changes need to be made in the regex?
P.S: I am new to regex
Upvotes: 3
Views: 201
Reputation: 123648
>>> line="This is a line containing a summary of config changes"
>>> listIgnore = ['summary', 'config']
>>> patterns = "|".join(listIgnore)
>>> print re.findall(r'\b(?!(?:' + patterns + r'))[\w_-]+', line)
['This', 'is', 'a', 'line', 'containing', 'a', 'of', 'changes']
Upvotes: 3
Reputation: 3009
This question intrigued me, so I set about for an answer:
'^(?!summary)(?!config)[\w_-]+$'
Now this only works if you want to match the regex against a complete string:
>>> re.match('^(?!summary)(?!config)[\w_-]+$','config_test')
>>> (None)
>>> re.match('^(?!summary)(?!config)[\w_-]+$','confi_test')
>>> <_sre.SRE_Match object at 0x21d34a8>
So to use your list, just add in more (?!<word here>)
for each word after ^
in your regex. These are called lookaheads. Here's some good info.
If you're trying to match within a string (i.e. without the ^
and $
) then I'm not sure it's possible. For instance the regex will just pick a subset of the string that doesn't match. Example: ummary
for summary
.
Obviously the more exclusions you pick the more inefficient it will get. There's probably better ways to do it.
Upvotes: 2