Azat Ibrakov
Azat Ibrakov

Reputation: 10953

Consecutive slices of iterable

Suppose I have an iterator

numbers = iter(range(100))

and I want to count consecutive mean values and store them in iterable with elements

0., 0.5, ..., 49., 49.5

this could be done by converting iterable to list/tuple and counting its slices like

from statistics import mean

# in cases with large or potentially infinite amounts of data
# this conversion will fail
numbers_list = list(numbers)
numbers_slices = (numbers_list[:end + 1] for end in range(len(numbers_list)))
mean_values = map(mean, numbers_slices)

(more info about mean function at docs)

So my question is more general: is there any way to get consecutive slices of iterable using standard library without wrapping with list/tuple?


We can write utility function like

def get_slices(iterable):
    elements = []
    for element in iterable:
        elements.append(element)
        yield elements

and then

numbers_slices = get_slices(numbers)
mean_values = map(mean, numbers_slices)

but it also looks awful


P. S.: I know that it will be better to count consecutive mean values like

def get_mean_values(numbers):
    numbers_sum = 0
    for numbers_count, number in enumerate(numbers, start=1):
        numbers_sum += number
        yield numbers_sum / numbers_count

but it is not what I am talking about.

Upvotes: 2

Views: 1114

Answers (3)

Azat Ibrakov
Azat Ibrakov

Reputation: 10953

it seems like there is no standard way of getting consecutive slices of iterable (iterator/list/tuple/etc)

so better way i've found out is to use a bit modified utility function from original question

def consecutive_slices(iterable):
    elements = []
    for element in iterable:
        elements.append(element)
        yield list(elements)

Modifications:

  • added copying of elements (btw there are many ways of doing that), because previous version in case of wrapping in list

    >>> numbers_slices = list(get_slices(numbers))
    

    will give us list with N repititions of elements with all numbers in them (N equals to 100 in example):

    >>> numbers_slices == [list(range(100))] * 100
    True
    

Functional approach

After writing a bit more I realized that this can also be done using itertools module like

from itertools import (accumulate,
                       chain)


def consecutive_slices(iterable):
    def collect_elements(previous_elements, element):
        return previous_elements + [element]

    return accumulate(chain(([],), iterable), collect_elements)

here we are prepending empty list using chain as initial slice, which can be ignored in result using islice like

from itertools import islice
...
islice(consecutive_slices(range(10)), 1, None)

but it seems legit to leave it as one slices since empty slice is also a slice afterall.

In comparison with previous solution this is still 4-lines-of-code function that does nearly the same thing, but less "spaghetti" IMO.

Upvotes: 2

Netwave
Netwave

Reputation: 42678

Take a look at itertools.islice Link

import itertools
def get_slices(iterable):
    return map(lambda x: itertools.islice(iterable, x), xrange(len(iterable)))

If you don't know the length, here you have a reduce version, highly ineficcient in memory:

from functools import reduce
numbers = (number for number in range(1,100))
mean = lambda x, y: (x+y)/float(2)
reduce(lambda x, y: x + [mean(x[-1], y)], numbers, [0])
[0.0, 0.5, 1.25, 2.125, 3.0625, 4.03125, 5.015625, 6.0078125, 7.00390625, 8.001953125, 9.0009765625, 10.00048828125, 11.000244140625, 12.0001220703125, 13.00006103515625, 14.000030517578125, 15.000015258789062, 16.00000762939453, 17.000003814697266, 18.000001907348633, 19.000000953674316, 20.000000476837158, 21.00000023841858, 22.00000011920929, 23.000000059604645, 24.000000029802322, 25.00000001490116, 26.00000000745058, 27.00000000372529, 28.000000001862645, 29.000000000931323, 30.00000000046566, 31.00000000023283, 32.000000000116415, 33.00000000005821, 34.000000000029104, 35.00000000001455, 36.000000000007276, 37.00000000000364, 38.00000000000182, 39.00000000000091, 40.000000000000455, 41.00000000000023, 42.000000000000114, 43.00000000000006, 44.00000000000003, 45.000000000000014, 46.00000000000001, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0]

So, in the end we are doing almost the same as your code, so you should go with it or use a list instead of a generator ande then use slices of the list (itertools.ilice).

EDIT: I've been thinking about this, it was easily solved with Haskell scanl, so I generilezed the concept and get a very good result:

def scanl(f, g):
    n = next(g)
    yield n
    for e in g:
        n = f(n, e)
        yield n

list(scanl(mean, number))
[0, 0.5, 1.25, 2.125, 3.0625, 4.03125, 5.015625, 6.0078125, 7.00390625, 8.001953125, 9.0009765625, 10.00048828125, 11.000244140625, 12.0001220703125, 13.00006103515625, 14.000030517578125, 15.000015258789062, 16.00000762939453, 17.000003814697266, 18.000001907348633, 19.000000953674316, 20.000000476837158, 21.00000023841858, 22.00000011920929, 23.000000059604645, 24.000000029802322, 25.00000001490116, 26.00000000745058, 27.00000000372529, 28.000000001862645, 29.000000000931323, 30.00000000046566, 31.00000000023283, 32.000000000116415, 33.00000000005821, 34.000000000029104, 35.00000000001455, 36.000000000007276, 37.00000000000364, 38.00000000000182, 39.00000000000091, 40.000000000000455, 41.00000000000023, 42.000000000000114, 43.00000000000006, 44.00000000000003, 45.000000000000014, 46.00000000000001, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0]

Upvotes: 0

Arthur Tacca
Arthur Tacca

Reputation: 9988

You could have an generator that directly yields the means, with local variables containing the running total and count. (Actually you could get the count for free by iterating over enumerate(iterable) and adding 1 to the index. Is that enough of a hint?

Upvotes: 1

Related Questions