Reputation:
I have a text full of adverbes and it's replacements like this :
adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3
And i want the adverbes to replaced in my text:
Example :
'Hello adverbe1 this is a test' to be this : 'Hello replacement1 this is a test'
but am runing out of solutions, my code so far:
adverbes = open("list_adverbes_replacement.txt", encoding="utf-8")
list_adverbes = []
list_replacement = []
for ad in adverbes.readlines():
if ad != '' and ad.split('|')[0].strip(' ')[-3:] == 'ent':
list_adverbes.append(ad.split('|')[0].strip(' '))
list_replacement.append(ad.split('|')[1])
pattern = r"(\s+\b(?:{}))\b".format("|".join(list_adverbes))
data = re.sub(pattern, r"\1", data)
I couldn't find a way to replace each adverbes with the appropriate replacement.
the list_adverbes_replacement.txt
is the text i gave in the beginning, and please am looking for a regex solution, i just don't know what am missing.
Upvotes: 1
Views: 73
Reputation: 626691
You can initialize the dictionary with adverbs and replacements using
dct = {}
with open(r'__t.txt', 'r') as f:
for line in f:
items = line.strip().split('|')
dct[items[0].strip()] = items[1].strip()
The dct
will look like {'adverbe1': 'replacement1', 'adverbe2': 'replacement2', 'adverbe3': 'replacement3'}
.
Then, pip install triegex
(or use this solution from Speed up millions of regex replacements in Python 3) to streamline dynamic regex building and use
import triegex, re
dct = {}
with open(PATH_TO_FILE_WITH_SEARCH_AND_REPLACEMENTS, 'r') as f:
for line in f:
items = line.strip().split('|')
dct[items[0].strip()] = items[1].strip()
test = 'Hello adverbe1 this is a test'
pattern = re.compile(fr'\b{triegex.Triegex(*dct.keys()).to_regex()}')
print( pattern.sub(lambda x: dct[x.group()], test) )
# => Hello replacement1 this is a test
The pattern for this demo dictionary is \b(?:adverbe(?:1\b|2\b|3\b)|~^(?#match nothing))
, and it matches adverbe1
, adverbe2
, adverbe3
as whole words.
The lambda x: dct[x.group()]
, the replacement argument to re.sub
, gets the corresponding replacement value.
Upvotes: 0
Reputation: 3285
Simple and concise approach. Build a dictionary of key/value pairs for your replacements.
Then replace them using regex' re.sub
by matching on each word, looking up the word in the dictionary, and defaulting to the word itself if it's not in the dictionary
import re
d = dict()
with open('list_adverbes_replacement.txt', 'r') as fo:
for line in fo:
splt = line.split('|')
d[splt[0].strip()] = splt[1].strip()
s = 'Hello adverbe1 this is a test, adverbe2'
s = re.sub(r'(\w+)', lambda m: d.get(m.group(), m.group()), s)
print(s)
Upvotes: 1
Reputation: 18406
Given Adverbs like this:
adverbs = '''adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3'''
Create a dictionary out of it where key is the adverb and value is the replacement text.
adverbsDict = {item[0].strip():item[1].strip() for item in map(lambda x: x.split('|'), adverbs.split('\n'))}
Now iterate through each keys, and just call replace on the text for the given key with the corresponding value:
text = 'Hello adverbe1 this is a test'
for key in adverbsDict:
text = text.replace(key, adverbsDict[key])
OUTPUT:
'Hello replacement1 this is a test'
Upvotes: 0