Reputation:
Tryto use negative forward to replace all string which does not match a pattern:
regexPattern = '((?!*' + 'word1|word2|word3' + ').)*$'
mytext= 'jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq'
return re.sub(regexPattern, "P", mytext)
#Expected Correct Output: 'PPPPPPword1PPPPPPword2PPPPPword3PPP'
#BAD Output: 'jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq'
I try this but it does not work (string remains same). How to modify it ? (think this is pretty difficult regex)
Upvotes: 3
Views: 213
Reputation: 82949
You could use a two-stage approach: First, replace the characters that do match with some special character, then use that as a mask to replace all the other characters.
>>> text= 'jsdjsqd word1dsqsqsword2fjsdjsword3sqdq'
>>> p = 'word1|word2|word3'
>>> mask = re.sub(p, lambda m: 'X' * len(m.group()), text)
>>> mask
'jsdjsqd XXXXXdsqsqsXXXXXfjsdjsword3sqdq'
>>> ''.join(t if m == 'X' else 'P' for (t, m) in zip(text, mask))
'PPPPPPPPword1PPPPPPword2PPPPPPword3PPPP'
Of course, instead of X
you might have to choose a different character, that does not occur in the original string.
Upvotes: 0
Reputation: 627607
You can use
import re
regex = re.compile(r'(word1|word2|word3)|.', re.S)
mytext = 'jsdjsqd word1dsqsqsword2fjsdjsword3sqdq'
print(regex.sub(lambda m: m.group(1) if m.group(1) else "P", mytext))
// => PPPPPPPPword1PPPPPPword2PPPPPPword3PPPP
See the IDEONE demo
The regex is (word1|word2|word3)|.
:
(word1|word2|word3)
- either word1
, or word2
, or word3
character sequences|
- or....
- any character (incl. a newline as re.S
DOTALL mode is on)See the regex demo
Upvotes: 3