Reputation: 21311
There are several stackoverflow questions about ways to append together all the elements of many lists. What I don't see is a definitive answer about relative performance of these different methods. Also, sometimes a slight performance gain comes at the cost of some readability, so would be useful to know several different approaches.
The task: Given a list, that contains an arbitrary number of lists, each containing an arbitrary number of elements, form a single list, with all the elements. First all the elements from the first list, then all the elements of the second list, etc. This is a shallow append: if a list has an element that is a list, DON'T extract those individual elements. E.g.
append_all([ [11, 12, 13], [21, 22], [31, [320, 329], 33, 34] ])
=>
[11, 12, 13, 21, 22, 31, [320, 329], 33, 34]
Upvotes: 3
Views: 184
Reputation: 21311
Here are timings for several ways to append together multiple lists.
They are shown from fastest to slowest.
Python 2.7 (CPython - running inside Autodesk Maya 2014), Windows 7 64-bit, Intel Core i7-37770K @ 3.5 GHz.
import timeit
def p_timeit_min(msg, expr_str, number, setup):
times = timeit.repeat(expr_str, number=number, setup=setup, repeat=3)
print( '{0:18} => {1:6.3f}'.format( msg, min(times) ))
n = 1000
timeit.repeat('1+1', number=10000) # "dummy" -- I'm in an environment where the first timeit call is erratic in performance.
setup_0 = '; import operator; L1 = list(range(n)); LL = [[10 * x + v for v in L1] for x in range(n)]'
print
p_timeit_min('map+extend 100', 'all = []; map(all.extend, LL)', number=n, setup='n = 100'+setup_0)
p_timeit_min('for+extend 100', """
all = []
for L in LL:
all.extend(L)
""", number=n, setup='n = 100'+setup_0)
p_timeit_min('extend 100', 'all = []; [all.extend(L) for L in LL]', number=n, setup='n = 100'+setup_0)
# reduce with [] initializer, to avoid need to wrap each L in list().
p_timeit_min('reduce+iadd 100 []', 'all = reduce(operator.iadd, LL, [])', number=n, setup='n = 100'+setup_0)
p_timeit_min('filter extend 100', 'all = []; filter( lambda x: False, iter(all.extend(L) for L in LL) )', number=n, setup='n = 100'+setup_0)
# WARNING: If remove "list()" wrapper around "list_", since this version isn't supplying a [] to accumulate the results, iadd will MODIFY the first element of LL, which may not be desired.
p_timeit_min('reduce+iadd 100 list()', 'all = reduce(operator.iadd, (list(list_) for list_ in LL))', number=n, setup='n = 100'+setup_0)
p_timeit_min('chain 100', 'all = list(itertools.chain(*LL))', number=n, setup='n = 100'+setup_0)
p_timeit_min('comprehension 100', 'all = [x for list_ in LL for x in list_]', number=n, setup='n = 100'+setup_0)
p_timeit_min('nested for append 100',
"""
all = []
for L in LL:
for x in L:
all.append(L)
""", number=n, setup='n = 100'+setup_0)
p_timeit_min('sum 100', 'all = sum(LL, [])', number=n, setup='n = 100'+setup_0)
print
p_timeit_min('map+extend 200', 'all = []; map(all.extend, LL)', number=n, setup='n = 200'+setup_0)
p_timeit_min('for+extend 200', """
all = []
for L in LL:
all.extend(L)
""", number=n, setup='n = 200'+setup_0)
p_timeit_min('extend 200', 'all = []; [all.extend(L) for L in LL]', number=n, setup='n = 200'+setup_0)
p_timeit_min('reduce+iadd 200 []', 'all = reduce(operator.iadd, LL, [])', number=n, setup='n = 200'+setup_0)
p_timeit_min('filter extend 200', 'all = []; filter( lambda x: False, iter(all.extend(L) for L in LL) )', number=n, setup='n = 200'+setup_0)
p_timeit_min('reduce+iadd 200 list()', 'all = reduce(operator.iadd, (list(list_) for list_ in LL))', number=n, setup='n = 200'+setup_0)
p_timeit_min('chain 200', 'all = list(itertools.chain(*LL))', number=n, setup='n = 200'+setup_0)
p_timeit_min('comprehension 200', 'all = [x for list_ in LL for x in list_]', number=n, setup='n = 200'+setup_0)
p_timeit_min('nested for append 200', """
all = []
for L in LL:
for x in L:
all.append(L)
""", number=n, setup='n = 200'+setup_0)
p_timeit_min('sum 200', 'all = sum(LL, [])', number=n, setup='n = 200'+setup_0)
print
Output:
map+extend 100 => 0.062
for+extend 100 => 0.064 ** within margin of error of first place, but slower on average
extend 100 => 0.066
reduce+iadd 100 [] => 0.063 ** see "200" case for reason this isn't placed higher in list.
filter extend 100 => 0.078
reduce+iadd 100 list=> 0.105 ** ignore this - use the better "reduce" above.
chain 100 => 0.127
comprehension 100 => 0.250
nested for append 100=>0.672
sum 100 => 1.424
These times ~ 4x longer, for O(n) order algorithms -
"200" case is 200 x 200, so 4x as many total elements as "100" case.
OBSERVE: The top 5 variations are doing significantly better than O(n) - about 3x longer for 4x as many elements. This is because each list is longer; number of sub-lists increased by 2x:
map+extend 200 => 0.187
for+extend 200 => 0.190
extend 200 => 0.194
reduce+iadd 200 [] => 0.204
filter extend 200 => 0.217
reduce+iadd 200 list=> 0.311 ** ignore this - use the better "reduce" above.
chain 200 => 0.426
comprehension 200 => 0.931
nested for append 200=>2.676
sum 200 => 13.432
ANALYSIS: The top four solutions are not substantially different, if each list has many elements.
A nested list comprehension takes 4x as long (as best solution). On the other hand, it is still O(n) for total # elements. For many situations, this is 4x of a small enough time value that it doesn't matter.
CONCLUSION: if this is a performance-critical factor for your situation, use list.extend
with any means of looping, or reduce(operator.iadd, LL, [])
. Runner-up choices: itertools.chain
at 2x cost, or [ .. for .. for .. ]
(nested list comprehensions) at 4x cost.
CAVEAT: This is one test, on one machine, on one implementation, of Python 2.7.
It assumes you have lists & you need the result as a list; if you have/need sequences/generators/iterators, that might change the balance (perhaps in favor of itertools.chain)?
TODO: A test with many SHORT lists.
Upvotes: 4