Reputation: 16603
Possibly a duplicate, but I couldn't find anything.
I have a very long iterator (10000 items) and I need to iterate over it ~500 items at a time. So if my iterator was range(10000)
, it would look like this:
Iteration #1: 0, 1, 2, ... 497, 498, 499
Iteration #2: 1, 2, 3, ... 498, 499, 500
Iteration #3: 2, 3, 4, ... 499, 500, 501
Iteration #4: 3, 4, 5, ... 500, 501, 502
...
Iteration #9500: 9499, 9500, 9501 ... 9996, 9997, 9998
Iteration #9501: 9500, 9501, 9502 ... 9997, 9998, 9999
and so on. There is this method:
def nwise_slice(lst, n):
for i in range(len(lst) - n + 1):
yield lst[i:i + n]
However, this doesn't work with lazy iterators. I tried to create a solution using iterators and adapted from the itertools
pairwise
and consume
recipes (see here) to create this:
import itertools
def nwise_iter(lst, n):
iters = itertools.tee(lst, n)
for idx, itr in enumerate(iters):
next(itertools.islice(itr, idx, idx), None)
for group in zip(*iters):
yield group
which does the same (albeit yielding a tuple
rather than a list
, which does not matter to me). I also believe it doesn't create a lot of unnecessary slices. This solution works on non-sliceable iterators, like files (which I plan to work with). However, the itertools
solution was 2x slower:
In [4]: %timeit list(nwise_slice(list(range(10000)), 500))
46.9 ms ± 729 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit list(nwise_iter(list(range(10000)), 500))
102 ms ± 3.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
I don't want to have to load all of my test data into memory to take advantage of the slice
method. Is there a more efficient way to pull this off?
Upvotes: 5
Views: 744
Reputation: 12624
What about using a deque to "memoize" your items?
from collections import deque
def nwise_slice(it, n):
deq = deque((), n)
for x in it:
deq.append(x)
if len(deq)==n: yield deq
my_range = range(8)
for sub in nwise_slice(my_range, 5):
print(sub)
# =>
# deque([0, 1, 2, 3, 4], maxlen=5)
# deque([1, 2, 3, 4, 5], maxlen=5)
# deque([2, 3, 4, 5, 6], maxlen=5)
# deque([3, 4, 5, 6, 7], maxlen=5)
Upvotes: 4