Reputation: 51807
As a result of the comments in my answer on this thread, I wanted to know what the speed difference is between the +=
operator and ''.join()
So what is the speed comparison between the two?
Upvotes: 98
Views: 100591
Reputation: 14520
Note: This benchmark was informal and is due to be redone because it doesn't show a full picture of how these methods will perform with more realistically long strings. As mentioned in the comments by @Mark Amery, +=
is not reliably as fast as using f
-strings, and str#join
isn't as dramatically slower in realistic use cases.
These metrics are also likely outdated by the significant performance improvements introduced by subsequent CPython versions, and most notably, 3.11.
The existing answers are very well-written and researched, but here's another answer for the Python 3.6 era, since now we have literal string interpolation (AKA, f
-strings):
>>> import timeit
>>> timeit.timeit('f\'{"a"}{"b"}{"c"}\'', number=1000000)
0.14618930302094668
>>> timeit.timeit('"".join(["a", "b", "c"])', number=1000000)
0.23334730707574636
>>> timeit.timeit('a = "a"; a += "b"; a += "c"', number=1000000)
0.14985873899422586
Test performed using CPython 3.6.5 on a 2012 Retina MacBook Pro with an Intel Core i7 at 2.3 GHz.
Upvotes: 12
Reputation: 486
If I say it algorithmically, if you choose [ += ] then it generates a new object and it will be O(n)**2. But if you use [ .join ] then it will be O(n).
Upvotes: 1
Reputation: 21
If I expect well, for a list with k string, with n characters in total, time complexity of join should be O(nlogk) while time complexity of classic concatenation should be O(nk).
That would be the same relative costs as merging k sorted list (efficient method is O(nlkg), while the simple one, akin to concatenation is O(nk) ).
Upvotes: 1
Reputation: 15406
From: Efficient String Concatenation
Method 1:
def method1():
out_str = ''
for num in xrange(loop_count):
out_str += 'num'
return out_str
Method 4:
def method4():
str_list = []
for num in xrange(loop_count):
str_list.append('num')
return ''.join(str_list)
Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.
String join is significantly faster then concatenation.
Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).
Upvotes: 138
Reputation: 129
I rewrote the last answer, could jou please share your opinion on the way i tested?
import time
start1 = time.clock()
for x in range (10000000):
dog1 = ' and '.join(['spam', 'eggs', 'spam', 'spam', 'eggs', 'spam','spam', 'eggs', 'spam', 'spam', 'eggs', 'spam'])
end1 = time.clock()
print("Time to run Joiner = ", end1 - start1, "seconds")
start2 = time.clock()
for x in range (10000000):
dog2 = 'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'
end2 = time.clock()
print("Time to run + = ", end2 - start2, "seconds")
NOTE: This example is written in Python 3.5, where range() acts like the former xrange()
The output i got:
Time to run Joiner = 27.086106206103153 seconds
Time to run + = 69.79100515996426 seconds
Personally i prefer ''.join([]) over the 'Plusser way' because it's cleaner and more readable.
Upvotes: 0
Reputation: 51807
My original code was wrong, it appears that +
concatenation is usually faster (especially with newer versions of Python on newer hardware)
The times are as follows:
Iterations: 1,000,000
Python 3.3 on Windows 7, Core i7
String of len: 1 took: 0.5710 0.2880 seconds
String of len: 4 took: 0.9480 0.5830 seconds
String of len: 6 took: 1.2770 0.8130 seconds
String of len: 12 took: 2.0610 1.5930 seconds
String of len: 80 took: 10.5140 37.8590 seconds
String of len: 222 took: 27.3400 134.7440 seconds
String of len: 443 took: 52.9640 170.6440 seconds
Python 2.7 on Windows 7, Core i7
String of len: 1 took: 0.7190 0.4960 seconds
String of len: 4 took: 1.0660 0.6920 seconds
String of len: 6 took: 1.3300 0.8560 seconds
String of len: 12 took: 1.9980 1.5330 seconds
String of len: 80 took: 9.0520 25.7190 seconds
String of len: 222 took: 23.1620 71.3620 seconds
String of len: 443 took: 44.3620 117.1510 seconds
On Linux Mint, Python 2.7, some slower processor
String of len: 1 took: 1.8840 1.2990 seconds
String of len: 4 took: 2.8394 1.9663 seconds
String of len: 6 took: 3.5177 2.4162 seconds
String of len: 12 took: 5.5456 4.1695 seconds
String of len: 80 took: 27.8813 19.2180 seconds
String of len: 222 took: 69.5679 55.7790 seconds
String of len: 443 took: 135.6101 153.8212 seconds
And here is the code:
from __future__ import print_function
import time
def strcat(string):
newstr = ''
for char in string:
newstr += char
return newstr
def listcat(string):
chars = []
for char in string:
chars.append(char)
return ''.join(chars)
def test(fn, times, *args):
start = time.time()
for x in range(times):
fn(*args)
return "{:>10.4f}".format(time.time() - start)
def testall():
strings = ['a', 'long', 'longer', 'a bit longer',
'''adjkrsn widn fskejwoskemwkoskdfisdfasdfjiz oijewf sdkjjka dsf sdk siasjk dfwijs''',
'''this is a really long string that's so long
it had to be triple quoted and contains lots of
superflous characters for kicks and gigles
@!#(*_#)(*$(*!#@&)(*E\xc4\x32\xff\x92\x23\xDF\xDFk^%#$!)%#^(*#''',
'''I needed another long string but this one won't have any new lines or crazy characters in it, I'm just going to type normal characters that I would usually write blah blah blah blah this is some more text hey cool what's crazy is that it looks that the str += is really close to the O(n^2) worst case performance, but it looks more like the other method increases in a perhaps linear scale? I don't know but I think this is enough text I hope.''']
for string in strings:
print("String of len:", len(string), "took:", test(listcat, 1000000, string), test(strcat, 1000000, string), "seconds")
testall()
Upvotes: 11
Reputation: 15347
This is what silly programs are designed to test :)
Use plus
import time
if __name__ == '__main__':
start = time.clock()
for x in range (1, 10000000):
dog = "a" + "b"
end = time.clock()
print "Time to run Plusser = ", end - start, "seconds"
Output of:
Time to run Plusser = 1.16350010965 seconds
Now with join....
import time
if __name__ == '__main__':
start = time.clock()
for x in range (1, 10000000):
dog = "a".join("b")
end = time.clock()
print "Time to run Joiner = ", end - start, "seconds"
Output Of:
Time to run Joiner = 21.3877386651 seconds
So on python 2.6 on windows, I would say + is about 18 times faster than join :)
Upvotes: -3