Reputation: 79
text = "One sentence with one (two) three, but mostly one. And twos."
Desired result: A sentence with A (B) C, but mostly A. And twos.
Words should be replaced according to an exact match in lookup_dict. Therefore two in twos should not be replaced, as there is an additional letter in the word. Yet words next to spaces, commas, paranthesis and periods should be replaced.
lookup_dict = {'var': ["one", "two", "three"]}
match_dict = {'var': ["A", "B", "C"]}
var_dict = {}
for i,v in enumerate(lookup_dict['var']):
var_dict[v] = match_dict['var'][i]
xpattern = re.compile('|'.join(var_dict.keys()))
result = xpattern.sub(lambda x: var_dict[x.group()], text.lower())
result: A sentence with A (B) C, but mostly A. and Bs.
Can I achieve the desired output without adding every possible combination of words + adjacent characters to the dictionaries? This seems unnecessarily complicated:
lookup_dict = {'var':['one ', 'one,', '(one)', 'one.', 'two ', 'two,', '(two)', 'two.', 'three ', 'three,', '(three)' 'three.']
...
result = xpattern.sub(lambda x: var_dict[x.group()] if x.group() in lookup_dict['var'] else x.group(), text.lower())
Upvotes: 1
Views: 1346
Reputation: 6343
Ok finally finished a solution! It's super verbose and I wouldn't let it babysit my kids, but here it is anyway. The other answer is probably a better solution :)
Firstly there's a better way to represent the words you want to replace with their replacements:
lookup_dict = {"one": "A", "two": "B", "three": "C"}
It looks like what you really want is to match whole words but ignore punctuation and case. For that, we can strip punctuation from the string each time we try to match it, and then reconstruct the original word with the letter "A" instead of "one", etc.
import re
text = "One sentence with one (two) three, but mostly one. And twos."
lookup_dict = {"one": "A", "two": "B", "three": "C"}
# Make a regex for only letters.
regex = re.compile('[^a-zA-Z]')
textSplit = text.split()
for i in range(0, len(textSplit)):
# Get rid of punctuation.
word = regex.sub('', textSplit[i]).lower()
if word in lookup_dict:
# Fetch the right letter from the lookup_dict.
letter = lookup_dict[word]
# Find where the word is in the punctuated string (super flakey I know).
wInd = textSplit[i].find(word)
# Just making sure the word needs to be reconstructed at all.
if wInd != -1:
# Rebuilding the string with punctuation.
newWord = textSplit[i][0:wInd] + letter + textSplit[i][wInd+len(word):]
textSplit[i] = newWord
print(" ".join(textSplit))
Not a great solution I know but I pushed through it. Take it as a bit of fun so please no downvotes haha.
Upvotes: 1
Reputation: 617
w = "Where are we one today two twos them"
lookup_dict = {"one":"1", "two":"2", "three":"3"}
pattern = re.compile(r'\b(' + '|'.join(lookup_dict.keys()) + r')\b')
output = pattern.sub(lambda x: lookup_dict[x.group()],w)
This would print out 'Where are we 1 today 2 twos them'
basically,
I updated your dictionary to use keys for each entry.
Created a regex which basically matches any of the items in your dictionary, using the regex \b(every|key|in|your|dictionary)\b to match either items a,b,c. And use the word boundaries around it to match anything not part of a word. ie spaces, carets etc.
Then using the pattern, substitute all the matches that occurred
Upvotes: 4