Reputation: 466
I have an array of frequent words in the text I am an analyzing, and I intend to use regex fuzzing matching to replace any misspellings of them.
I know I could loop over them like:
import regex as re
edits = 1
my_arr = ['word1', 'word2', 'word3']
my_text = 'this is my text with wrd1 in it'
for word in my_arr:
r_pattern = '(' + word + ')' + '){e<=' + str(edits) + '}'
my_text = re.sub(r_pattern, word, my_text)
But is there a way to use regex.sub
to do this with one line? i.e. so my pattern could look something like
r_pattern = '(word1|word2|word3){e<=1}'
Upvotes: 1
Views: 219
Reputation: 118
Here is my solution
import regex as re
def repl(matchObj):
return str(matchObj.lastgroup)
edits = 1
my_arr = ['word1', 'word2', 'word3']
my_text = 'this is my text with wrd3 in it'
r_pattern = ""
for i in range(len(my_arr)):
if i == len(my_arr)-1:
r_pattern += '(?P<' + my_arr[i] + '>' + my_arr[i] + '){e<=' + str(edits) + '}'
else:
r_pattern += '(?P<' + my_arr[i] + '>' + my_arr[i] + '){e<=' + str(edits) + '}|'
r = re.compile(r_pattern)
my_text = re.sub(r, repl, my_text)
print (my_text)
It uses the lastgroup attribute of the match object which tells you which group caused the substitution to trigger. This should scale well with a larger array if you need it to, assuming there isn't a limit on re.compile that will get in your way. Hope this helps. Python Doc with lastgroup: https://docs.python.org/2/library/re.html Handy regex editor to help with future problems: https://regex101.com
Upvotes: 1