LordLing
LordLing

Reputation: 331

Python 3.3's yield from

Python 3 brings the yield from semantics. As far as I understand it's supposed to yield to the outermost generator in which case I'd expect this code to be linear in N.

from collections import Iterable

def flatten(L):
  for e in L:
    if isinstance(e, Iterable):
      yield from flatten(e)
    else:
      yield e 

N = 100
L = [-1]
for i in range(N):
  L = [i, [L], i]
for i in range(100):
  f = list(flatten(L))
print(len(f))

If I set N=200 however the computation time is about four times longer suggesting flatten is quadratic in the length of L. I can't understand why it would be since the code visits each element only once and the yield from keyword should be sending values directly from the inner generator to where the terms are collected into the list.

Is this a bug, simply not intended, or am I using it wrong? Is there a nice way to do O(N) flattening in Python?

Upvotes: 6

Views: 2426

Answers (1)

DSM
DSM

Reputation: 353419

yield from, just like for item in x: yield x, is linear. However, function calls are slow, and because of the nesting in your l, when you double N, you're not merely doubling the number of terms, you're doubling the number of calls needed. Anything which scales with the number of calls, like function overhead itself esp. due to the recursion, any yield from overhead, for loop initialization overhead, whatever, would therefore cause a problem. If this is right, then we'd expect that a list with the same number of elements and no nesting would be linear, mid-nesting would be in-between, and lots of nesting would be super-slow. And that's what we see:

import timeit

s = """

from collections import Iterable

def flatten(l):
   for e in l:
       if isinstance(e, Iterable):
           yield from flatten(e)
       else:
           yield e 

def build_lots(N):
    l = [-1]
    for i in range(N):
        l = [i, [l], i]
    return l

def build_some(N):
    l = [-1]
    for i in range(N):
        l = [i]+l+[i] if i % 2 else [i,[l],i]
    return l

def build_none(N):
    return range(2*N+1)

"""

def get_time(size, which_build, n=100):
    setup = s + "l = build_{}({});".format(which_build, size)
    ans = timeit.Timer(setup=setup, stmt="z=list(flatten(l))").timeit(n)
    return ans

print('length', 'none','some','lots')
for w in range(0, 500, 50):
    print(2*w+1, 
          get_time(w, 'none'), 
          get_time(w, 'some'),
          get_time(w, 'lots'))

produces

length none some lots
1 0.0012789969332516193 0.0006600483320653439 0.000653265044093132
101 0.030214487109333277 0.06863495009019971 0.10554128512740135
201 0.05980244372040033 0.188231083098799 0.3237948380410671
301 0.08960435865446925 0.3752179825678468 0.6493003228679299
401 0.11986956000328064 0.6066137161105871 1.147628225851804
501 0.14737469609826803 0.9323012446984649 1.7087256000377238
601 0.18555369088426232 1.2575508910231292 2.2957410947419703
701 0.20820995513349771 1.712264522910118 3.094764341134578
801 0.23618148919194937 2.100640726275742 4.079551971051842
901 0.26863432209938765 2.617169467266649 4.858607416041195

simple plot of time vs nesting

Upvotes: 14

Related Questions