Reputation: 13123
Lets say I have a tuple generator, which I simulate as follows:
g = (x for x in (1,2,3,97,98,99))
For this specific generator, I wish to write a function to output the following:
(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)
So I'm iterating over three consecutive items at a time and printing them, except when I approach the end.
Should the first line in my function be:
t = tuple(g)
In other words, is it best to work on a tuple directly or might it be beneficial to work with a generator. If it is possible to approach this problem using both methods, please state the benefits and disadvantages for both approaches. Also, if it might be wise to use the generator approach, how might such a solution look?
Here's what I currently do:
def f(data, l):
t = tuple(data)
for j in range(len(t)):
print(t[j:j+l])
data = (x for x in (1,2,3,4,5))
f(data,3)
UPDATE:
Note that I've updated my function to take a second argument specifying the length of the window.
Upvotes: 5
Views: 326
Reputation: 123443
Here's a generator that works in both Python 2.7.17 and 3.8.1. Internally it uses iterators and generators whenever possible, so it should be relatively memory efficient.
try:
from itertools import izip, izip_longest, takewhile
except ImportError: # Python 3
izip = zip
from itertools import zip_longest as izip_longest, takewhile
def tuple_window(n, iterable):
iterators = [iter(iterable) for _ in range(n)]
for n, iterator in enumerate(iterators):
for _ in range(n):
next(iterator)
_NULL = object() # Unique singleton object.
for t in izip_longest(*iterators, fillvalue=_NULL):
yield tuple(takewhile(lambda v: v is not _NULL, t))
if __name__ == '__main__':
data = (1, 2, 3, 97, 98, 99)
for t in tuple_window(3, data):
print(t)
Output:
(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)
Upvotes: 0
Reputation: 60137
It's definitely best to work with the generator because you don't want to have to hold everything in memory.
It can be done very simply with a deque.
from collections import deque
from itertools import islice
def overlapping_chunks(size, iterable, *, head=False, tail=False):
"""
Get overlapping subsections of an iterable of a specified size.
print(*overlapping_chunks(3, (1,2,3,97,98,99)))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is given, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is truthy, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If tail is truthy, the "cool down" after the iterable is exhausted
is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
"""
chunker = deque(maxlen=size)
iterator = iter(iterable)
for item in islice(iterator, size-1):
chunker.append(item)
if head:
yield list(chunker)
for item in iterator:
chunker.append(item)
yield list(chunker)
if tail:
while len(chunker) > 1:
chunker.popleft()
yield list(chunker)
Upvotes: 1
Reputation: 104712
If you might need to take more than three elements at a time, and you don't want to load the whole generator into memory, I suggest using a deque
from the collections
module in the standard library to store the current set of items. A deque
(pronounced "deck" and meaning "double-ended queue") can have values pushed and popped efficiently from both ends.
from collections import deque
from itertools import islice
def get_tuples(gen, n):
q = deque(islice(gen, n)) # pre-load the queue with `n` values
while q: # run until the queue is empty
yield tuple(q) # yield a tuple copied from the current queue
q.popleft() # remove the oldest value from the queue
try:
q.append(next(gen)) # try to add a new value from the generator
except StopIteration:
pass # but we don't care if there are none left
Upvotes: 2
Reputation: 504
I think what you currently do seems a lot easier than any of the above. If there isn't any particular need to make it more complicated, my opinion would be to keep it simple. In other words, it is best to work on a tuple directly.
Upvotes: 0
Reputation: 117370
Actually there're functions for this in itertools module - tee() and izip_longest():
>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]
from documentation:
Return n independent iterators from a single iterable. Equivalent to:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
Upvotes: 3
Reputation: 5414
actually it depends.
A generator might be useful in case of very large collections, where you dont really need to store them all in memory to achieve the result you want. On the other hand, you have to print it is seems safe to guess that the collection isn't huge, so it doesn make a difference.
However, this is a generator that achieve what you were looking for
def part(gen, size):
t = tuple()
try:
while True:
l = gen.next()
if len(t) < size:
t = t + (l,)
if len(t) == size:
yield t
continue
if len(t) == size:
t = t[1:] + (l,)
yield t
continue
except StopIteration:
while len(t) > 1:
t = t[1:]
yield t
>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>>
note: the code actually isn't very elegant but it works also when you have to split in, say, 5 pieces
Upvotes: 1
Reputation: 24278
A specific example for returning three items could read
def yield3(gen):
b, c = gen.next(), gen.next()
try:
while True:
a, b, c = b, c, gen.next()
yield (a, b, c)
except StopIteration:
yield (b, c)
yield (c,)
g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
print l
Upvotes: 3