Baz
Baz

Reputation: 13123

Accessing consecutive items when using a generator

Lets say I have a tuple generator, which I simulate as follows:

g = (x for x in (1,2,3,97,98,99))

For this specific generator, I wish to write a function to output the following:

(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)

So I'm iterating over three consecutive items at a time and printing them, except when I approach the end.

Should the first line in my function be:

t = tuple(g)

In other words, is it best to work on a tuple directly or might it be beneficial to work with a generator. If it is possible to approach this problem using both methods, please state the benefits and disadvantages for both approaches. Also, if it might be wise to use the generator approach, how might such a solution look?

Here's what I currently do:

def f(data, l):
    t = tuple(data)
    for j in range(len(t)):
        print(t[j:j+l])

data = (x for x in (1,2,3,4,5))
f(data,3)

UPDATE:

Note that I've updated my function to take a second argument specifying the length of the window.

Upvotes: 5

Views: 326

Answers (7)

martineau
martineau

Reputation: 123443

Here's a generator that works in both Python 2.7.17 and 3.8.1. Internally it uses iterators and generators whenever possible, so it should be relatively memory efficient.

try:
    from itertools import izip, izip_longest, takewhile
except ImportError:  # Python 3
    izip = zip
    from itertools import zip_longest as izip_longest, takewhile

def tuple_window(n, iterable):
    iterators = [iter(iterable) for _ in range(n)]
    for n, iterator in enumerate(iterators):
        for _ in range(n):
            next(iterator)
    _NULL = object()  # Unique singleton object.
    for t in izip_longest(*iterators, fillvalue=_NULL):
        yield tuple(takewhile(lambda v: v is not _NULL, t))

if __name__ == '__main__':
    data = (1, 2, 3, 97, 98, 99)
    for t in tuple_window(3, data):
        print(t)

Output:

(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)

Upvotes: 0

Veedrac
Veedrac

Reputation: 60137

It's definitely best to work with the generator because you don't want to have to hold everything in memory.

It can be done very simply with a deque.

from collections import deque
from itertools import islice

def overlapping_chunks(size, iterable, *, head=False, tail=False):
    """
    Get overlapping subsections of an iterable of a specified size.

        print(*overlapping_chunks(3, (1,2,3,97,98,99)))
        #>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If head is given, the "warm up" before the specified maximum
    number of items is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
        #>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If head is truthy, the "warm up" before the specified maximum
    number of items is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
        #>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If tail is truthy, the "cool down" after the iterable is exhausted
    is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
        #>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
    """

    chunker = deque(maxlen=size)
    iterator = iter(iterable)

    for item in islice(iterator, size-1):
        chunker.append(item)

        if head:
            yield list(chunker)

    for item in iterator:
        chunker.append(item)
        yield list(chunker)

    if tail:
        while len(chunker) > 1:
            chunker.popleft()
            yield list(chunker) 

Upvotes: 1

Blckknght
Blckknght

Reputation: 104712

If you might need to take more than three elements at a time, and you don't want to load the whole generator into memory, I suggest using a deque from the collections module in the standard library to store the current set of items. A deque (pronounced "deck" and meaning "double-ended queue") can have values pushed and popped efficiently from both ends.

from collections import deque
from itertools import islice

def get_tuples(gen, n):
    q = deque(islice(gen, n))   # pre-load the queue with `n` values
    while q:                    # run until the queue is empty
        yield tuple(q)          # yield a tuple copied from the current queue
        q.popleft()             # remove the oldest value from the queue
        try:
            q.append(next(gen)) # try to add a new value from the generator
        except StopIteration:
            pass                # but we don't care if there are none left

Upvotes: 2

Gemmu
Gemmu

Reputation: 504

I think what you currently do seems a lot easier than any of the above. If there isn't any particular need to make it more complicated, my opinion would be to keep it simple. In other words, it is best to work on a tuple directly.

Upvotes: 0

roman
roman

Reputation: 117370

Actually there're functions for this in itertools module - tee() and izip_longest():

>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]

from documentation:

Return n independent iterators from a single iterable. Equivalent to:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                newval = next(it)       # fetch a new value and
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

Upvotes: 3

Ant
Ant

Reputation: 5414

actually it depends.

A generator might be useful in case of very large collections, where you dont really need to store them all in memory to achieve the result you want. On the other hand, you have to print it is seems safe to guess that the collection isn't huge, so it doesn make a difference.

However, this is a generator that achieve what you were looking for

def part(gen, size):
    t = tuple()
    try:
        while True:
        l = gen.next()
        if len(t) < size:
            t = t + (l,)
            if len(t) == size:
                yield t
            continue
        if len(t) == size:
            t = t[1:] + (l,)
            yield t
            continue
    except StopIteration:
        while len(t) > 1:
        t = t[1:]
        yield t

>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>> 

note: the code actually isn't very elegant but it works also when you have to split in, say, 5 pieces

Upvotes: 1

David Zwicker
David Zwicker

Reputation: 24278

A specific example for returning three items could read

def yield3(gen):
    b, c = gen.next(), gen.next()
    try:
        while True:
            a, b, c = b, c, gen.next()
            yield (a, b, c)
    except StopIteration:
        yield (b, c)
        yield (c,)


g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
    print l

Upvotes: 3

Related Questions