user5497885
user5497885

Reputation:

Negative pattern matching Reg ex In Python

Tryto use negative forward to replace all string which does not match a pattern:

regexPattern = '((?!*' + 'word1|word2|word3' + ').)*$'  
mytext= 'jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq'
return re.sub(regexPattern, "P", mytext)

#Expected Correct Output:  'PPPPPPword1PPPPPPword2PPPPPword3PPP'

#BAD Output:  'jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq'

I try this but it does not work (string remains same). How to modify it ? (think this is pretty difficult regex)

Upvotes: 3

Views: 213

Answers (2)

tobias_k
tobias_k

Reputation: 82949

You could use a two-stage approach: First, replace the characters that do match with some special character, then use that as a mask to replace all the other characters.

>>> text= 'jsdjsqd word1dsqsqsword2fjsdjsword3sqdq'
>>> p = 'word1|word2|word3'
>>> mask = re.sub(p, lambda m: 'X' * len(m.group()), text)
>>> mask
'jsdjsqd XXXXXdsqsqsXXXXXfjsdjsword3sqdq'
>>> ''.join(t if m == 'X' else 'P' for (t, m) in zip(text, mask))
'PPPPPPPPword1PPPPPPword2PPPPPPword3PPPP'

Of course, instead of X you might have to choose a different character, that does not occur in the original string.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You can use

import re
regex = re.compile(r'(word1|word2|word3)|.', re.S)
mytext = 'jsdjsqd word1dsqsqsword2fjsdjsword3sqdq'
print(regex.sub(lambda m: m.group(1) if m.group(1) else "P", mytext))
// => PPPPPPPPword1PPPPPPword2PPPPPPword3PPPP

See the IDEONE demo

The regex is (word1|word2|word3)|.:

  • (word1|word2|word3) - either word1, or word2, or word3 character sequences
  • | - or...
  • . - any character (incl. a newline as re.S DOTALL mode is on)

See the regex demo

Upvotes: 3

Related Questions