Reputation: 3427
I have a list of patterns and a list of replacements. The pattern contains repeating elements but they correspond to different replacements.
txt=132GOasmHOMEwokdslNOWsdwkGO239NOW
pattern=['GO','HOME','NOW','GO','NOW']
REPLACEMENT=['why','nope','later','aha','genes']
The desired output would be 132whyasmnopewokdsllatersdwkaha239genes
What's the most efficient way to accomplish the sequential replacement?
Upvotes: 3
Views: 282
Reputation: 2263
Using a dict
reduces the number of items you need to iterate over which may be valuable for some long inputs.
txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'
pattern = ['GO','HOME','NOW','GO','NOW']
REPLACEMENT = ['why','nope','later','aha','genes']
x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))
for k in x:
txt = txt.replace(k,x[k], 1)
print(txt)
Edit: for fun I added a benchmark to backup to illustrate that reducing the number of items you need to iterate over may be valuable for some long inputs. What is most efficient isn't always apparent when you are using a trivial test data set.
#! /usr/bin/env python
# -*- coding: UTF8 -*-
def alpha(pattern, REPLACEMENT, txt):
for a,b in zip(pattern,REPLACEMENT):
txt=txt.replace(a,b,1)
def beta(pattern, REPLACEMENT, txt):
for i,x in enumerate(pattern):
txt = txt.replace(x,REPLACEMENT[i], 1)
def gamma(pattern, REPLACEMENT, txt):
x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))
for k in x:
txt = txt.replace(k,x[k], 1)
def delta(pattern, REPLACEMENT, txt):
new_d = iter(REPLACEMENT)
new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)
if __name__ == '__main__':
import timeit, re
txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'
pattern = ['GO','HOME','NOW','GO','NOW']
REPLACEMENT = ['why','nope','later','aha','genes']
print("Trivial inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));
print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))
print("beta: ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta, txt, pattern, REPLACEMENT"))
print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))
print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))
print("")
txtcopy = txt
patterncopy = pattern.copy()
REPLACEMENTcopy = REPLACEMENT.copy()
for _ in range(3):
txt = txt + txtcopy
pattern.extend(patterncopy)
REPLACEMENT.extend(REPLACEMENTcopy)
print("Small inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));
print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))
print("beta: ", timeit.timeit("beta( pattern, REPLACEMENT, txt)", setup="from __main__ import beta, txt, pattern, REPLACEMENT"))
print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))
print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))
print("")
txt = txtcopy
pattern = patterncopy.copy()
REPLACEMENT = REPLACEMENTcopy.copy()
for _ in range(300):
txt = txt + txtcopy
pattern.extend(patterncopy)
REPLACEMENT.extend(REPLACEMENTcopy)
print("Larger inputs: len(pattern): {}, len(REPLACEMENT): {}, len(txt): {}".format(len(pattern), len(REPLACEMENT), len(txt)));
print("alpha: ", timeit.timeit("alpha(pattern, REPLACEMENT, txt)", setup="from __main__ import alpha, txt, pattern, REPLACEMENT"))
print("beta: ", timeit.timeit("beta(pattern, REPLACEMENT, txt)", setup="from __main__ import beta, txt, pattern, REPLACEMENT"))
print("gamma: ", timeit.timeit("gamma(pattern, REPLACEMENT, txt)", setup="from __main__ import gamma, txt, pattern, REPLACEMENT"))
print("delta: ", timeit.timeit("delta(pattern, REPLACEMENT, txt)", setup="from __main__ import delta, txt, pattern, REPLACEMENT"))
Results:
Trivial inputs: len(pattern): 5, len(REPLACEMENT): 5, len(txt): 33
alpha: 4.60048107800003
beta: 4.169088881999869
gamma: 5.7612637450001785
delta: 11.371387353000046
Small inputs: len(pattern): 20, len(REPLACEMENT): 20, len(txt): 132
alpha: 17.281149661999734
beta: 15.131949634000193
gamma: 7.339897444000144
delta: 26.50896787900001
Larger inputs: len(pattern): 1505, len(REPLACEMENT): 1505, len(txt): 9933
alpha: 18766.660852467998
beta: 17640.960064803
gamma: 64.01868645999639
delta: 901.3577002189995
So, for trivial inputs the enumerate
solution is a tiny bit faster than zip and a lot faster than iter
. When the length of the inputs is increased slightly then the cost of not removing duplicates starts to show and my solution runs in less than half the time. When a long input with lots of duplicates is run then @eatmeimadanish solution takes 27555% longer to complete than when duplicates are removed. Ouch.
Upvotes: 0
Reputation: 3907
txt='132GOasmHOMEwokdslNOWsdwkGO239NOW'
pattern=['GO','HOME','NOW','GO','NOW']
REPLACEMENT=['why','nope','later','aha','genes']
for i,x in enumerate(pattern):
txt = txt.replace(x,REPLACEMENT[i], 1)
For fun, here are the time tests, since the question asked for most efficient.
pattern=['GO','HOME','NOW','GO','NOW']
REPLACEMENT=['why','nope','later','aha','genes']
t = time.time()
for z in xrange(1000000):
txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'
for a,b in zip(pattern,REPLACEMENT):
txt=txt.replace(a,b,1)
print time.time() - t
t = time.time()
for z in xrange(1000000):
txt2 = '132GOasmHOMEwokdslNOWsdwkGO239NOW'
for i,x in enumerate(pattern):
txt2 = txt2.replace(x,REPLACEMENT[i], 1)
print time.time() - t
t = time.time()
for z in xrange(1000000):
txt3 = '132GOasmHOMEwokdslNOWsdwkGO239NOW'
x = dict(zip(reversed(pattern), reversed(REPLACEMENT)))
for k in x:
txt3 = txt3.replace(k,x[k], 1)
print time.time() - t
t = time.time()
for z in xrange(1000000):
txt = '132GOasmHOMEwokdslNOWsdwkGO239NOW'
new_d = iter(REPLACEMENT)
new_result = re.sub('\b' + '|'.join(pattern) + '\b', lambda _: next(new_d), txt)
print time.time() - t
Which results to:
2.57099986076
2.48500013351
3.50499987602
4.23699998856
As you can see, enumerate is slightly more efficient than zip, and the other two are not in the same ballpark.
Upvotes: 0
Reputation: 125
I think you should try this:
import re
txt = "132GOasmHOMEwokdslNOWsdwkGO239NOW"
pattern = ['GO','HOME','NOW','GO','NOW']
REPLACEMENT = ['why','nope','later','aha','genes']
txt1 = re.sub(pattern[1], REPLACEMENT[1], txt)
txt2 = re.sub(pattern[2], REPLACEMENT[2], txt1)
txt3 = re.sub(pattern[3], REPLACEMENT[3], txt2)
txt4 = re.sub(pattern[4], REPLACEMENT[4], txt3)
FINAL_TEXT = re.sub(pattern[5], REPLACEMENT[5], txt4)
print(FINAL_TEXT)
And the output:
"132whyasmnotwokdsllatersdwkaha239genes"
Upvotes: -1
Reputation: 1260
You can loop through the two lists at the same time, and only replace the first instance of the pattern each time:
for a,b in zip(pattern,REPLACEMENT):
txt=txt.replace(a,b,1)
Upvotes: 3