Reputation: 62155
I've seen there are actually two (maybe more) ways to concatenate lists in Python:
One way is to use the extend()
method:
a = [1, 2]
b = [2, 3]
b.extend(a)
the other to use the plus (+) operator:
b += a
Now I wonder: which of those two options is the 'pythonic' way to do list concatenation and is there a difference between the two? (I've looked up the official Python tutorial but couldn't find anything anything about this topic).
Upvotes: 362
Views: 126325
Reputation: 3349
+=
only works if the statement would also work with an =
sign, i.e. the left-hand side can be assigned to. This is because a += b
actually becomes a = a.__iadd__(b)
under the hood. So if a
is something that can't be assigned to (either by syntax or semantics), such as a function call or an element of an immutable container, the +=
version will fail.
You can't assign to a function call (the syntax of Python forbids it), so you also can't +=
a function call's result directly:
list1 = [5, 6]
list2 = [7, 8]
def get_list():
return list1
get_list().extend(list2) # works
get_list() += list2 # SyntaxError: can't assign to function call
A perhaps weirder case is when the list is an element of an immutable container, e.g. a tuple:
my_tuple = ([1, 2], [3, 4], [5, 6])
my_tuple[0].extend([10, 11]) # works
my_tuple[0] += [10, 11] # TypeError: 'tuple' object does not support item assignment
Since you can't do my_tuple[0] = something
, you also can't do +=
.
To sum up, you can use +=
if you can use =
.
Upvotes: 81
Reputation: 51
The +=
operator is negligibly if at all faster than list.extend()
which is confirmed by dalonsoa's answer. You're literally exchanging a method call for two other operations.
>>> dis.dis("_list.extend([1])")
1 0 LOAD_NAME 0 (_list)
2 LOAD_METHOD 1 (extend)
4 LOAD_CONST 0 (4)
6 BUILD_LIST 1
8 CALL_METHOD 1
10 RETURN_VALUE
>>> dis.dis("_list += [1]")
1 0 LOAD_NAME 0 (_list)
2 LOAD_CONST 0 (4)
4 BUILD_LIST 1
6 INPLACE_ADD
8 STORE_NAME 0 (_list)
10 LOAD_CONST 1 (None)
12 RETURN_VALUE
Note, that this does not apply to numpy
arrays, since numpy
arrays are not at all Python lists and should not be treated as such (Lance Ruo Zhang's answer).
The +=
will not work for list in tuples most likely because of the STORE_SUBSCR
operation (Jann Poppinga's answer). Note, however, that in this case list.__iadd__()
(being a method call) works perfectly fine.
The +=
does not create a new list (ding's answer).
I apologise for posting all this as an answer, I do not have enough rep to comment.
Upvotes: 2
Reputation: 2119
The .extend() method on lists works with any iterable*, += works with some but can get funky.
import numpy as np
l = [2, 3, 4]
t = (5, 6, 7)
l += t
l
[2, 3, 4, 5, 6, 7]
l = [2, 3, 4]
t = np.array((5, 6, 7))
l += t
l
array([ 7, 9, 11])
l = [2, 3, 4]
t = np.array((5, 6, 7))
l.extend(t)
l
[2, 3, 4, 5, 6, 7]
Python 3.6
*pretty sure .extend() works with any iterable but please comment if I am incorrect
Edit: "extend()" changed to "The .extend() method on lists" Note: David M. Helmuth's comment below is nice and clear.
Upvotes: 6
Reputation: 79
ary += ext creates a new List object, then copies data from lists "ary" and "ext" into it.
ary.extend(ext) merely adds reference to "ext" list to the end of the "ary" list, resulting in less memory transactions.
As a result, .extend works orders of magnitude faster and doesn't use any additional memory outside of the list being extended and the list it's being extended with.
╰─➤ time ./list_plus.py
./list_plus.py 36.03s user 6.39s system 99% cpu 42.558 total
╰─➤ time ./list_extend.py
./list_extend.py 0.03s user 0.01s system 92% cpu 0.040 total
The first script also uses over 200MB of memory, while the second one doesn't use any more memory than a 'naked' python3 process.
Having said that, the in-place addition does seem to do the same thing as .extend.
Upvotes: 7
Reputation: 544
This will work
t = ([],[])
t[0].extend([1,2])
while this won't
t = ([],[])
t[0] += [1,2]
The reason is that +=
generates a new object. If you look at the long version:
t[0] = t[0] + [1,2]
you can see how that would change which object is in the tuple, which is not possible. Using .extend()
modifies an object in the tuple, which is allowed.
Upvotes: 0
Reputation: 711
From the CPython 3.5.2 source code: No big difference.
static PyObject *
list_inplace_concat(PyListObject *self, PyObject *other)
{
PyObject *result;
result = listextend(self, other);
if (result == NULL)
return result;
Py_DECREF(result);
Py_INCREF(self);
return (PyObject *)self;
}
Upvotes: 5
Reputation: 10920
I've looked up the official Python tutorial but couldn't find anything anything about this topic
This information happens to be buried in the Programming FAQ:
... for lists,
__iadd__
[i.e.+=
] is equivalent to callingextend
on the list and returning the list. That's why we say that for lists,+=
is a "shorthand" forlist.extend
You can also see this for yourself in the CPython source code: https://github.com/python/cpython/blob/v3.8.2/Objects/listobject.c#L1000-L1011
Upvotes: 4
Reputation: 277
Actually, there are differences among the three options: ADD
, INPLACE_ADD
and extend
. The former is always slower, while the other two are roughly the same.
With this information, I would rather use extend
, which is faster than ADD
, and seems to me more explicit of what you are doing than INPLACE_ADD
.
Try the following code a few times (for Python 3):
import time
def test():
x = list(range(10000000))
y = list(range(10000000))
z = list(range(10000000))
# INPLACE_ADD
t0 = time.process_time()
z += x
t_inplace_add = time.process_time() - t0
# ADD
t0 = time.process_time()
w = x + y
t_add = time.process_time() - t0
# Extend
t0 = time.process_time()
x.extend(y)
t_extend = time.process_time() - t0
print('ADD {} s'.format(t_add))
print('INPLACE_ADD {} s'.format(t_inplace_add))
print('extend {} s'.format(t_extend))
print()
for i in range(10):
test()
ADD 0.3540440000000018 s
INPLACE_ADD 0.10896000000000328 s
extend 0.08370399999999734 s
ADD 0.2024550000000005 s
INPLACE_ADD 0.0972940000000051 s
extend 0.09610200000000191 s
ADD 0.1680199999999985 s
INPLACE_ADD 0.08162199999999586 s
extend 0.0815160000000077 s
ADD 0.16708400000000267 s
INPLACE_ADD 0.0797719999999913 s
extend 0.0801490000000058 s
ADD 0.1681250000000034 s
INPLACE_ADD 0.08324399999999343 s
extend 0.08062700000000689 s
ADD 0.1707760000000036 s
INPLACE_ADD 0.08071900000000198 s
extend 0.09226200000000517 s
ADD 0.1668420000000026 s
INPLACE_ADD 0.08047300000001201 s
extend 0.0848089999999928 s
ADD 0.16659500000000094 s
INPLACE_ADD 0.08019399999999166 s
extend 0.07981599999999389 s
ADD 0.1710910000000041 s
INPLACE_ADD 0.0783479999999912 s
extend 0.07987599999999873 s
ADD 0.16435900000000458 s
INPLACE_ADD 0.08131200000001115 s
extend 0.0818660000000051 s
Upvotes: 11
Reputation: 760
According to the Python for Data Analysis.
“Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. ” Thus,
everything = []
for chunk in list_of_lists:
everything.extend(chunk)
is faster than the concatenative alternative:
everything = []
for chunk in list_of_lists:
everything = everything + chunk
Upvotes: -1
Reputation: 401
I would say that there is some difference when it comes with numpy (I just saw that the question ask about concatenating two lists, not numpy array, but since it might be a issue for beginner, such as me, I hope this can help someone who seek the solution to this post), for ex.
import numpy as np
a = np.zeros((4,4,4))
b = []
b += a
it will return with error
ValueError: operands could not be broadcast together with shapes (0,) (4,4,4)
b.extend(a)
works perfectly
Upvotes: 9
Reputation: 319541
The only difference on a bytecode level is that the .extend
way involves a function call, which is slightly more expensive in Python than the INPLACE_ADD
.
It's really nothing you should be worrying about, unless you're performing this operation billions of times. It is likely, however, that the bottleneck would lie some place else.
Upvotes: 315
Reputation: 3956
You can't use += for non-local variable (variable which is not local for function and also not global)
def main():
l = [1, 2, 3]
def foo():
l.extend([4])
def boo():
l += [5]
foo()
print l
boo() # this will fail
main()
It's because for extend case compiler will load the variable l
using LOAD_DEREF
instruction, but for += it will use LOAD_FAST
- and you get *UnboundLocalError: local variable 'l' referenced before assignment*
Upvotes: 252