Reputation: 2669
I need to group either a list of floats, or a list of (named)tuples of varying length, into groups based on whether or not the key is bigger or smaller than a given value.
For example given a list of powers of 2 less than 1, and a list of cutoffs:
twos = [2**(-(i+1)) for i in range(0,10)]
cutoffs = [0.5, 0.125, 0.03125]
Then function
split_into_groups(twos, cutoffs)
should return
[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
I've implemented the function like this:
def split_by_prob(items, cutoff, groups, key=None):
for k,g in groupby(enumerate(items), lambda (j,x): x<cutoff):
groups.append((map(itemgetter(1),g)))
return groups
def split_into_groups(items, cutoffs, key=None):
groups = items
final = []
for i in cutoffs:
groups = split_by_prob(groups,i,[],key)
if len(groups) > 1:
final.append(groups[0])
groups = groups.pop()
else:
final.append(groups[0])
return final
final.append(groups)
return final
The tests that these currently pass are:
>>> split_by_prob(twos, 0.5, [])
[[0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
>>> split_into_groups(twos, cutoffs)
[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
>>> split_into_groups(twos, cutoffs_p10)
[[0.5, 0.25, 0.125], [0.0625, 0.03125, 0.015625], [0.0078125, 0.00390625, 0.001953125], [0.0009765625]]
Where cutoffs_p10 = [10**(-(i+1)) for i in range(0,5)]
I can straightforwardly extend this to a list of tuples of the form
items = zip(range(0,10), twos)
by changing
def split_by_prob(items, cutoff, groups, key=None):
for k,g in groupby(enumerate(items), lambda (j,x): x<cutoff):
groups.append((map(itemgetter(1),g)))
return groups
to
def split_by_prob(items, cutoff, groups, key=None):
for k,g in groupby(enumerate(items), lambda (j,x): x[1]<cutoff):
groups.append((map(itemgetter(1),g)))
return groups
How do I go about extending the original method by adding a key that defaults to a list of floats (or ints etc) but one that could handle tuples and namedtuples?
For example something like:
split_into_groups(items, cutoffs, key=items[0])
would return
[[(0,0.5)], [(1,0.25), (2,0.125)], [(3,0.0625), (4,0.03125)], [(5,0.015625), (6,0.0078125), (7,0.00390625), (8,0.001953125), (9,0.0009765625)]]
Upvotes: 1
Views: 55
Reputation: 44112
In my answer I assume, the cutoffs are at the end in increasing order - just to simplify the situation.
Discriminator
detecting a slotclass Discriminator(object):
def __init__(self, cutoffs):
self.cutoffs = sorted(cutoffs)
self.maxslot = len(cutoffs)
def findslot(self, num):
cutoffs = self.cutoffs
for slot, edge in enumerate(self.cutoffs):
if num < edge:
return slot
return self.maxslot
grouper
to put items into slotsfrom collections import defaultdict
def grouper(cutoffs, items, key=None):
if not key:
key = lambda itm: itm
discr = Discriminator(cutoffs)
result = defaultdict(list)
for item in items:
num = key(item)
result[discr.findslot(num)].append(item)
return result
def split_into_groups(cutoffs, numbers, key=None):
groups = grouper(cutoffs, numbers, key)
slot_ids = sorted(groups.keys())
return [groups[slot_id] for slot_id in slot_ids]
grouper
Proposed Discriminator works even for unsorted items.
key
In fact, providing the key
functions is easier, than it originally looked.
It is just a function provided via parameter, so it becomes an alias for the transformation function to call to get the value, we want to use for comparing, grouping etc.
There is special case of None
, for such a situation we have to use some identity function.
Simplest one is
func = lambda itm: itm
Note: all the functions above were tested by a test suite (incl. use of key
function, but I removed it from this answer as it was becoming far too long.
Upvotes: 1