Reputation: 1036
sometimes you should have nested merge to merged list(it is similar to np.flatten()
).
when the list of list is given like below, and you should flatten it
a = [[j for j in range(0, 10)] for i in range(0, 10000)]
you have two kinds of solution to solve it. itertools.chain.from_iterable
and functools.reduce
.
%timeit list(itertools.chain.from_iterable(a))
%timeit reduce(lambda x, y: x+y, a)
do you think which one is faster and how much faster than other thing?
itertools.chain.from_iterable
is 1000 times faster or more(when the length of the list is bigger).
If somebody knows why this thing happen, please let me know.
always thx for you support and help.
Upvotes: 2
Views: 1001
Reputation: 96172
Yes, because list concatenation, i.e. using +
, is an O(N) operation. When you do that to incrementally build a list of size N, it becomes O(N2).
Instead, using chain.from_iterable
will simply iterate over all N items in the final list, using the list
type constructor, which will have linear performance.
This is why you shouldn't use sum
to flatten a list (note, reduce(lambda x, y: x+y,...)
is simply sum
).
Note, the idiomatic way to flatten a nested list like this is to use a list comprehension:
[x for sub in a for x in sub]
This is such an anti-pattern, the sum
method prevents you doing it with str
objects:
>>> sum(['here', 'is', 'some', 'strings'], '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]
Note, your reduce
/sum
approach is equivalent to:
result = []
for sub in a:
result = result + sub
Which demonstrates the expensive +
in the loop quite clearly. Note, the following naive approach actually has O(N) behavior instead of O(N2):
result = []
for sub in a:
result += sub
That is because my_list += something
is equivalent to my_list.extend(something)
, and .extend
(along with .append
) have amortized constant-time behavior, so overall, it will be O(N).
Upvotes: 6