Why is returning a list faster than appending to a return variable?

Question

In my mind using a return variable rather than using the return statement means you should skip a copy of the list. It's not as readable but I thought it would be faster.

so I tested it:

import timeit

TEST_DATA = 10000000
ITERATIONS = 1000

time_taken = timeit.timeit(
"""
def get_big_list(test_data):
    return [random.random() for _ in range(100000)]

return_list = get_big_list(test_data)

""",
setup='import random; test_data ={0}'.format(TEST_DATA), number=ITERATIONS)

print("return_list", time_taken / ITERATIONS)

# ----------------------------- SECOND TEST -----------------------------------

time_taken = timeit.timeit(
"""
def get_big_list(test_data, return_list):
    [return_list.append(random.random()) for _ in range(100000)]

return_list = []
get_big_list(test_data, return_list)

""",
setup='import random; test_data ={0}'.format(TEST_DATA), number=ITERATIONS)

print("return_list", time_taken / ITERATIONS)

I was really surprised by the numbers I received and I don't understand why the return statement is faster.

$ python large_list_return.py 
('return_list', 0.013130356788635254)
('return_list', 0.022573610067367553)
$ python3 large_list_return.py 
return_list 0.016797171516984236
return_list 0.02749005461903289

Another interesting thing I found was that it was faster to use a for loop to append than it was to use a list comprehension.

time_taken = timeit.timeit(
"""
def get_big_list(test_data, return_list):
    for index in range(100000):
        return_list.append(random.random())

return_list = []
get_big_list(test_data, return_list)

""",
setup='import random; test_data ={0}'.format(TEST_DATA), number=ITERATIONS)

returned these numbers

python large_list_return.py 
('return_list', 0.0133241708278656)
('return_list', 0.019642770051956176)
python3 large_list_return.py 
return_list 0.017075919962022453
return_list 0.024502045304980128

Alex Martelli · Accepted Answer

Python doesn't make copies unless you explicitly ask for a copy (e.g by taking a slice of a list) -- you just get another reference to a list that's been created anyway.

append on the other hand may need to get more memory and copy the previous contents -- it's amortized O(1), but it's still a lot more work.

Moreover, since you use a list comprehension for what should be a loop:

[return_list.append(random.random()) for _ in range(100000)]

in your second implementation you're also creating another list (which then you're tossing away), one make up of a hundred thousand occurrences of None (the return value from append. That's why the for is faster there -- it avoids uselessly creating that redundant list.

If you're looking for optimizations, consider hoisting namespace lookups, as I mentioned above in a comment. Specifically, using:

def get_big_list(test_data):
    r = random.random
    return [r() for _ in range(100000)]

on my dated laptop, this takes 12 milliseconds vs your first version's 17 (which tells us that each random.random lookup takes about 50 nanoseconds).

Why is returning a list faster than appending to a return variable?

Answers (1)

Related Questions