Tom Kealy
Tom Kealy

Reputation: 2669

Extending grouping code to handle more general inputs

I need to group either a list of floats, or a list of (named)tuples of varying length, into groups based on whether or not the key is bigger or smaller than a given value.

For example given a list of powers of 2 less than 1, and a list of cutoffs:

twos = [2**(-(i+1)) for i in range(0,10)]
cutoffs = [0.5, 0.125, 0.03125]

Then function

split_into_groups(twos, cutoffs)

should return

[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

I've implemented the function like this:

def split_by_prob(items, cutoff, groups, key=None):
    for k,g in groupby(enumerate(items), lambda (j,x): x<cutoff):
        groups.append((map(itemgetter(1),g)))
    return groups

def split_into_groups(items, cutoffs, key=None):
    groups = items
    final = []
    for i in cutoffs:
        groups = split_by_prob(groups,i,[],key)
        if len(groups) > 1:
            final.append(groups[0])
            groups = groups.pop()
        else:
            final.append(groups[0])
            return final
    final.append(groups)
    return final

The tests that these currently pass are:

>>> split_by_prob(twos, 0.5, [])
[[0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

>>> split_into_groups(twos, cutoffs)
[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
>>> split_into_groups(twos, cutoffs_p10)
[[0.5, 0.25, 0.125], [0.0625, 0.03125, 0.015625], [0.0078125, 0.00390625, 0.001953125], [0.0009765625]]

Where cutoffs_p10 = [10**(-(i+1)) for i in range(0,5)]

I can straightforwardly extend this to a list of tuples of the form

items = zip(range(0,10), twos)

by changing

def split_by_prob(items, cutoff, groups, key=None):
    for k,g in groupby(enumerate(items), lambda (j,x): x<cutoff):
        groups.append((map(itemgetter(1),g)))
    return groups

to

def split_by_prob(items, cutoff, groups, key=None):
    for k,g in groupby(enumerate(items), lambda (j,x): x[1]<cutoff):
        groups.append((map(itemgetter(1),g)))
    return groups

How do I go about extending the original method by adding a key that defaults to a list of floats (or ints etc) but one that could handle tuples and namedtuples?

For example something like:

split_into_groups(items, cutoffs, key=items[0])

would return

[[(0,0.5)], [(1,0.25), (2,0.125)], [(3,0.0625), (4,0.03125)], [(5,0.015625), (6,0.0078125), (7,0.00390625), (8,0.001953125), (9,0.0009765625)]]

Upvotes: 1

Views: 55

Answers (1)

Jan Vlcinsky
Jan Vlcinsky

Reputation: 44112

In my answer I assume, the cutoffs are at the end in increasing order - just to simplify the situation.

Discriminator detecting a slot

class Discriminator(object):
    def __init__(self, cutoffs):
        self.cutoffs = sorted(cutoffs)
        self.maxslot = len(cutoffs)
    def findslot(self, num):
        cutoffs = self.cutoffs
        for slot, edge in enumerate(self.cutoffs):
            if num < edge:
                return slot
        return self.maxslot

grouper to put items into slots

from collections import defaultdict
def grouper(cutoffs, items, key=None):
    if not key:
        key = lambda itm: itm
    discr = Discriminator(cutoffs)
    result = defaultdict(list)
    for item in items:
        num = key(item)
        result[discr.findslot(num)].append(item)
    return result

def split_into_groups(cutoffs, numbers, key=None):
    groups = grouper(cutoffs, numbers, key)
    slot_ids = sorted(groups.keys())
    return [groups[slot_id] for slot_id in slot_ids]

Conclusions about Discriminator and grouper

Proposed Discriminator works even for unsorted items.

Conclusions about key

In fact, providing the key functions is easier, than it originally looked.

It is just a function provided via parameter, so it becomes an alias for the transformation function to call to get the value, we want to use for comparing, grouping etc.

There is special case of None, for such a situation we have to use some identity function.

Simplest one is

func = lambda itm: itm

Note: all the functions above were tested by a test suite (incl. use of key function, but I removed it from this answer as it was becoming far too long.

Upvotes: 1

Related Questions