smci
smci

Reputation: 33950

Why is chaining iterables this complicated? Simplify this code

I want to chain multiple iterables, everything with lazy evaluation (speed is crucial), to do the following:

The real example is more complex, here's a simplified example:

Here's a sample line of stdin: 2 13 4 16 16 15 22 17 8 8 7 6

(For debugging purposes, instream below might point to sys.stdin, or an opened filehandle)

You can't simply chain generators since map() returns a (lazily-evaluated) list:

import itertools
gen1 = map(int, (map(str.split, instream))) # CAN'T CHAIN DIRECTLY

The least complicated working solution I found is this, can it surely not be simplified?

gen1 = map(int, itertools.chain.from_iterable(itertools.chain(map(str.split, instream))))

Why the hell do I need to chain itertools.chain.from_iterable(itertools.chain just to process the result from map(str.split, instream) - it sort of defeats the purpose? Is manually defining my generators faster?

Upvotes: 3

Views: 719

Answers (2)

PeterE
PeterE

Reputation: 5855

You could build your generator by hand:

import string

def gen1(stream):
    # presuming that stream is of type io.TextIOBase


    s = ""
    c = stream.read(1)  
    while len(c)>0:

        if (c not in string.digits):
            if len(s) > 0:
                i = int(s)
                yield i
                s = ""
        else:
            s += c

        c = stream.read(1)

    if len(s) > 0:
        i = int(s)
        yield i 


import io
g = gen1(io.StringIO("12 45  6 7 88"))
for x in g:    # dangerous if stream is unlimited
    print(x)

Which is certainly not the most beautiful code, but it does what you want. Explanations:

If your input is indefinitely long you have to read it in chunks (or character wise). Whenever you encounter a non-digit (whitespace), you convert the characters you have read until that point into an integer and yield it. You also have to consider what happens when you reach the EOF. My implementation is probably not very well performed, due to the fact that I'm reading char-wise. Using chunks one could speed it up significantly.

EDIT as to why your approach will never work:

map(str.split, instream)

does simply not do what you appear to think it does. map applies the given function str.split to each element of the iterator given as the second parameter. In your case that is a stream, i.e. a file object, in the case of sys.stdin specifically a io.TextIOBase object. Which indeed can be iterated over. Line by line, which emphatically is NOT what you want! In effect you iterate over your input line by line and split each line into words. The map generator iterates over (many) lists of words NOT over A list of words. Which is why you have to chain them together to get a single list to iterate on.

Also, the itertools.chain() in itertools.chain.from_iterable(itertools.chain(map(...))) is redundant. itertools.chain chains its arguments (each an inalterable object) together into one iterator. You only give it one argument so there is nothing to chain together, it basically returns the map object unchanged. itertools.chain.from_iterable() on the other hand takes one argument, which is expected to be an iterator of iterators (e.g. a list of lists) and flattens it into one iterator (list).

EDIT2

import io, itertools

instream = io.StringIO("12 45 \n 66 7 88")
gen1 = itertools.chain.from_iterable(map(str.split, instream))
gen2 = map(int, gen1)
list(gen2)

returns

[12, 45, 66, 7, 88]

Upvotes: 0

shx2
shx2

Reputation: 64318

An explicit ("manual") generator expression should be preferred over using map and filter. It is more readable to most people, and more flexible.

If I understand your question, this generator expression does what you need:

gen1 = ( int(x) for line in instream for x in line.split() )

Upvotes: 2

Related Questions