Reputation: 162
I have always been told that python's native append is a slow function and should be avoided in for loops. However after a couple of small tests I have found it performs more poorly than a numpy array when iterating over it with a for loop:
def pythonAppend(n):
x = []
for i in range(n):
x.append(i)
return x
%timeit pythonAppend(1000000)
def numpyConstruct(n):
x = np.zeros(n)
for i in range(n):
x[i] = i
return x
%timeit numpyConstruct(1000000)
Python Time: 179 ms
Numpy Time: 189 ms
n = 1000000
x = pythonAppend(n)
arr = numpyConstruct(n)
order = arr[:]; np.random.shuffle(order); order = list(order.astype(int))
def listAccess(x):
for i in range(len(x)):
x[i] = i/3.
return x
def listAccessOrder(x, order):
for i in order:
x[i] = i/3.
return x
%timeit listAccess(x)
%timeit listAccess(arr)
%timeit listAccessOrder(x, order)
%timeit listAccessOrder(arr, order)
%timeit arr / 3.
python -sequential: 175 ms
numpy -sequential: 198 ms
python -shuffled access: 2.08 s
numpy -shuffled access: 2.15 s
numpy -vectorised access: 1.92ms
Upvotes: 0
Views: 1155
Reputation: 231325
Python lists are not linked lists. The data buffer of the list object contains links/pointers to objects (else where in memory). So fetching the ith
element is easy. The buffer also has growth headroom, so append
is a simple matter of inserting the new element's link. The buffer has to be reallocated periodically, but Python manages that seamlessly.
A numpy array also has a 1d data buffer, but it contains numeric values (more generally the bytes as required by the dtype
). Fetching is easy, but it has to create a new python object to 'contain' the value ('unboxing'). Assignment also requires conversion from the python object to bytes that will be stored.
Generally we've found that making a new array by appending to a list (with one np.array
call at the end) is competitive with assignment to a preallocated array.
Iteration through a numpy array is normally slower than iterating through a list.
What we strongly discourage is using np.append
(or some variant) to grow an array iteratively. That makes a new array each time, with full copy.
numpy arrays are fast when the iteration is done in the compiled code. Normally that's with the whole-array methods. The c
code can iterate over the data buffer, operating on the values directly, rather than invoking python object methods at each step.
Upvotes: 2