Cyrano Chatziantoniou
Cyrano Chatziantoniou

Reputation: 25

Why isn't preallocating faster than using append?

I was trying to create an example where preallocating a numpy array is quicker than using .append on a list, but ended up proving the opposite.

import numpy as np
import time

t = time.time()
emptyList   = []
for i in range(int(1e8)):
    emptyList.append(1)
print("Append: ", time.time() - t)

t = time.time()
emptyList   = np.empty(int(1e8))
for i in range(int(1e8)):
    emptyList[i] = 1
print("Preallocate: ", time.time() - t)

Which gives the following output:

Append:  12.733097314834595
Preallocate:  19.008652925491333

I'm very confused as to why appending to a list is quicker in this case. I've also tried using random numbers instead of '1' everywhere, but got the same result.

Upvotes: 1

Views: 1639

Answers (3)

Marine Galantin
Marine Galantin

Reputation: 2279

It is not possible to edit @Prune example, I offer this snippet of code in the attempt to propose an inclusive and complete example of her point:

import time

nb_rep = 1e7
t = time.time()
emptyList = []
for i in range(int(nb_rep)):
    emptyList.append(1)
print("Append: ", time.time() - t)

t = time.time()
emptyList = [1] * int(nb_rep)
for i in range(int(nb_rep)):
    emptyList[i] = 1
print("Preallocate: ", time.time() - t)

t = time.time()
emptyList = np.empty(int(nb_rep))
for i in range(int(nb_rep)):
    emptyList[i] = 1
print("Preallocate Numpy: ", time.time() - t)

the result locally:

Append:  0.6769435405731201
Preallocate:  0.5485126972198486
Preallocate Numpy:  0.9414911270141602

Upvotes: 0

Prune
Prune

Reputation: 77867

Please note that you're not using the same class in both examples. You're comparing append for the built-in Python type list against assignment to -- not a list, but to allocate and assign to a NumPy array.

I would expect them to be much closer, but that's not the issue: you have two items here, allocation and assignment.

When I run your code as given, I get

Append:       16.296499013900757
Preallocate:  16.835283517837524

When I change the second loop to eliminate the NumPy overhead:

emptyList   = [0] * (int(1e8))

Result:

Append:       23.76796054840088
Preallocate:  11.050438404083252

Upvotes: 4

Tommy
Tommy

Reputation: 13672

It's because python lists aren't actually linked lists; they are implemented underneath as arrays: https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.

They are being smarter than simply increasing each time a single append is requested, similar to how TCP works with a multiplicative increase.

Other reading: https://www.geeksforgeeks.org/internal-working-of-list-in-python/

How is Python's List Implemented?

Upvotes: 4

Related Questions