Reputation: 715
I have a dictionary of slangs with their meanings and I want to replace all the slangs in my text.
I have found partially working solution https://stackoverflow.com/a/2400577
For now my code looks like this:
import re
myText = 'brb some sample text I lov u. I need some $$ for 2mw.'
dictionary = {
'brb': 'be right back',
'lov u': 'love you',
'$$': 'money',
'2mw': 'tomorrow'
}
pattern = re.compile(r'\b(' + '|'.join(re.escape(key) for key in dictionary.keys()) + r')\b')
result = pattern.sub(lambda x: dictionary[x.group()], myText)
print(result)
Output:
be right back some sample text I love you. I need some $$ for tomorrow.
As you can see sings $$ haven't changed and I know it is due to \b
syntax. How can I change my regex to achieve my goal?
Upvotes: 4
Views: 4840
Reputation: 627607
Replace the word boundaries with lookarounds that check for any word chars around the search phrase
pattern = re.compile(r'(?<!\w)(' + '|'.join(re.escape(key) for key in dictionary.keys()) + r')(?!\w)')
See the Python demo
The (?<!\w)
negative lookbehind fails the match if there is a word char immediately to the left of the current location and the (?!\w)
negative lookahead fails the match if there is a word char immediately to the right of the current location.
Replace (?<!\w)
with (?<!\S)
and (?!\w)
with (?!\S)
if you need to only match search phrases in between whitespace chars and start/end of string.
Upvotes: 2