why 'functools.reduce' and 'itertools.chain.from_itertools' had different computation time when nested list merged

Question

sometimes you should have nested merge to merged list(it is similar to np.flatten() ). when the list of list is given like below, and you should flatten it

a = [[j for j in range(0, 10)] for i in range(0, 10000)]

you have two kinds of solution to solve it. itertools.chain.from_iterable and functools.reduce.

%timeit list(itertools.chain.from_iterable(a))
%timeit reduce(lambda x, y: x+y, a)

do you think which one is faster and how much faster than other thing?

itertools.chain.from_iterable is 1000 times faster or more(when the length of the list is bigger).

If somebody knows why this thing happen, please let me know.

always thx for you support and help.

juanpa.arrivillaga · Accepted Answer

Yes, because list concatenation, i.e. using +, is an O(N) operation. When you do that to incrementally build a list of size N, it becomes O(N²).

Instead, using chain.from_iterable will simply iterate over all N items in the final list, using the list type constructor, which will have linear performance.

This is why you shouldn't use sum to flatten a list (note, reduce(lambda x, y: x+y,...) is simply sum).

Note, the idiomatic way to flatten a nested list like this is to use a list comprehension:

[x for sub in a for x in sub]

This is such an anti-pattern, the sum method prevents you doing it with str objects:

>>> sum(['here', 'is', 'some', 'strings'], '')
Traceback (most recent call last):
  File "", line 1, in 
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Note, your reduce/sum approach is equivalent to:

result = []
for sub in a:
    result = result + sub

Which demonstrates the expensive + in the loop quite clearly. Note, the following naive approach actually has O(N) behavior instead of O(N²):

result = []
for sub in a:
    result += sub

That is because my_list += something is equivalent to my_list.extend(something), and .extend (along with .append) have amortized constant-time behavior, so overall, it will be O(N).

why 'functools.reduce' and 'itertools.chain.from_itertools' had different computation time when nested list merged

Answers (1)

Related Questions

why &#39;functools.reduce&#39; and &#39;itertools.chain.from_itertools&#39; had different computation time when nested list merged

Answers (1)

Related Questions

why 'functools.reduce' and 'itertools.chain.from_itertools' had different computation time when nested list merged