Extending grouping code to handle more general inputs

Question

I need to group either a list of floats, or a list of (named)tuples of varying length, into groups based on whether or not the key is bigger or smaller than a given value.

For example given a list of powers of 2 less than 1, and a list of cutoffs:

twos = [2**(-(i+1)) for i in range(0,10)]
cutoffs = [0.5, 0.125, 0.03125]

Then function

split_into_groups(twos, cutoffs)

should return

[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

I've implemented the function like this:

def split_by_prob(items, cutoff, groups, key=None):
    for k,g in groupby(enumerate(items), lambda (j,x): x 1:
            final.append(groups[0])
            groups = groups.pop()
        else:
            final.append(groups[0])
            return final
    final.append(groups)
    return final

The tests that these currently pass are:

>>> split_by_prob(twos, 0.5, [])
[[0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

>>> split_into_groups(twos, cutoffs)
[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
>>> split_into_groups(twos, cutoffs_p10)
[[0.5, 0.25, 0.125], [0.0625, 0.03125, 0.015625], [0.0078125, 0.00390625, 0.001953125], [0.0009765625]]

Where cutoffs_p10 = [10**(-(i+1)) for i in range(0,5)]

I can straightforwardly extend this to a list of tuples of the form

items = zip(range(0,10), twos)

by changing

def split_by_prob(items, cutoff, groups, key=None):
    for k,g in groupby(enumerate(items), lambda (j,x): x



to 

def split_by_prob(items, cutoff, groups, key=None):
    for k,g in groupby(enumerate(items), lambda (j,x): x[1]


How do I go about extending the original method by adding a key that defaults to a list of floats (or ints etc) but one that could handle tuples and namedtuples?

For example something like:

split_into_groups(items, cutoffs, key=items[0])


would return

[[(0,0.5)], [(1,0.25), (2,0.125)], [(3,0.0625), (4,0.03125)], [(5,0.015625), (6,0.0078125), (7,0.00390625), (8,0.001953125), (9,0.0009765625)]]

Jan Vlcinsky · Accepted Answer

In my answer I assume, the cutoffs are at the end in increasing order - just to simplify the situation.

`Discriminator` detecting a slot

class Discriminator(object):
    def __init__(self, cutoffs):
        self.cutoffs = sorted(cutoffs)
        self.maxslot = len(cutoffs)
    def findslot(self, num):
        cutoffs = self.cutoffs
        for slot, edge in enumerate(self.cutoffs):
            if num < edge:
                return slot
        return self.maxslot

`grouper` to put items into slots

from collections import defaultdict
def grouper(cutoffs, items, key=None):
    if not key:
        key = lambda itm: itm
    discr = Discriminator(cutoffs)
    result = defaultdict(list)
    for item in items:
        num = key(item)
        result[discr.findslot(num)].append(item)
    return result

def split_into_groups(cutoffs, numbers, key=None):
    groups = grouper(cutoffs, numbers, key)
    slot_ids = sorted(groups.keys())
    return [groups[slot_id] for slot_id in slot_ids]

Conclusions about Discriminator and `grouper`

Proposed Discriminator works even for unsorted items.

Conclusions about `key`

In fact, providing the key functions is easier, than it originally looked.

It is just a function provided via parameter, so it becomes an alias for the transformation function to call to get the value, we want to use for comparing, grouping etc.

There is special case of None, for such a situation we have to use some identity function.

Simplest one is

func = lambda itm: itm

Note: all the functions above were tested by a test suite (incl. use of key function, but I removed it from this answer as it was becoming far too long.

Extending grouping code to handle more general inputs

Answers (1)

`Discriminator` detecting a slot

`grouper` to put items into slots

Conclusions about Discriminator and `grouper`

Conclusions about `key`

Related Questions

Extending grouping code to handle more general inputs

Answers (1)

Discriminator detecting a slot

grouper to put items into slots

Conclusions about Discriminator and grouper

Conclusions about key

Related Questions

`Discriminator` detecting a slot

`grouper` to put items into slots

Conclusions about Discriminator and `grouper`

Conclusions about `key`