Reputation: 2944
I have a python for loop over a string, that have to decide char-by-char if the current char should stay or be removed.
Let's say I have a function fun(character,state)
which receives a char, and some state parameter, and returns True
if the char should be kept, and a new state
. for example:
def fun(c,state):
if state%5==0:
return False, state+1
else:
return True, state+2
I have two ideas how to iterate (sequentially) over the string with this function, but I can't decide between them (or others) in terms of runtime complexity and space usage tradeoff:
def option1(string):
state = 0
i =0
while i<len(string):
answer, state = fun(string[i],state)
if not answer:
string = string[:i]+string[i+1:]
else:
i+=1
return string
def option2(string):
result = ''
state = 0
for c in string:
answer, state = fun(c,state)
if answer:
result+=c
return result
Does anyone has a better approach/ solution?
edit: some timing results:
on a long string ( 67976 chars) the results are:
looks like eliminating the i'th char with string[:i]+string[i+1:]
is not a good idea.
Upvotes: 0
Views: 107
Reputation: 2729
In CPython, a good performance rule of thumb is 'avoid function calls in tight loops'. The overhead tends to take a significant amount of time compared to simple iterative computations. In your case, a 'brute force' approach is to simply inline the logic in the loop that traverses your string:
def option4(strng):
result = []
state = 0
for c in strng:
if state % 5 == 0:
state += 1
else:
result.append(c)
state += 2
return ''.join(result)
This may feel a little clunky, a more 'pythonic' approach might be to use a python generator that takes an iterable and initial state as input and yields the matching members.
def filtergen(iter, initstate):
state = initstate
for c in iter:
if state % 5 == 0:
state += 1
else:
state += 2
yield c
def option5(strng):
return ''.join(filtergen(strng, 0))
This reduces call overhead - you only need as many calls as matching elements and as a side benefit, frees the caller from the need to keep track of state. Additionally, you no longer need the intermediate list. The filter remains generic - it works on any kind of iterable, rather than just strings.
Putting all of these together and measuring performance:
s = 'T0kzcZ0x8VQY8ulgFKU8D1MlIUULRsVsNZMnXbjliUEES6sEIVUzpjxlHLG59z' * 1200
def fun(c, state):
if state % 5 == 0:
return False, state + 1
else:
return True, state + 2
def option3(strng):
result = []
state = 0
for c in strng:
answer, state = fun(c, state)
if answer:
result.append(c)
return ''.join(result)
def option4(strng):
result = []
state = 0
for c in strng:
if state % 5 == 0:
state += 1
else:
result.append(c)
state += 2
return ''.join(result)
def filtergen(iter, initstate):
state = initstate
for c in iter:
if state % 5 == 0:
state += 1
else:
state += 2
yield c
def option5(strng):
return ''.join(filtergen(strng, 0))
import timeit
print("string length is " + str(len(s)))
print(timeit.timeit('option3(s)', globals=globals(), number=100))
print(timeit.timeit('option4(s)', globals=globals(), number=100))
print(timeit.timeit('option5(s)', globals=globals(), number=100))
print(option3(s) == option(4) == option5(s))
On my dinky laptop, the output looks like this (times are in seconds):
pvg /tmp ➤ python3 s.py
string length is 74400
2.488818967998668
1.3291878789968905
1.268602224998176
True
The 'inline logic' and generator both easily outperform the naive call-per-element approach.
Upvotes: 1
Reputation: 85482
String concatenation with +=
is not recommended.
Better use a list and join it.
def option2(string):
result = []
state = 0
for c in string:
answer, state = fun(c, state)
if answer:
result.append(c)
return ''.join(result)
This a good explanation why.
While +=
is optimized in CPython (for certain cases?), it can be really slow in other implementations such as PyPy, for example.
Upvotes: 3