Astrobeer3plus
Astrobeer3plus

Reputation: 45

Binning variable length lists in python

I have a dictionary d with 100 keys where the values are variable length lists, e.g.

 In[165]: d.values()[0]
 Out[165]: 
 [0.0432,
  0.0336,
  0.0345,
  0.044,
  0.0394,
  0.0555]

 In[166]: d.values()[1]
 Out[166]: 
 [0.0236,
  0.0333,
  0.0571]

Here's what I'd like to do: for every list in d.values(), I'd like to organize the values into 10 bins (where a value gets tossed into a bin if it satisfies the criteria, e.g. is between 0.03 and 0.04, 0.04 and 0.05, etc.).

What'd I'd like to end up with is something that looks exactly like d, but instead of d.values()[0] being a list of numbers, I'd like it to be a list of lists, like so:

 In[167]: d.values()[0]
 Out[167]:
 [[0.0336,0.0345,0.0394],
  [0.0432,0.044],
  [0.0555]]

Each key would still be associated with the same values, but they'd be structured into the 10 bins.

I've been going crazy with nested for loops and if/elses, etc. What is the best way to go about this?

EDIT: Hi, all. Just wanted to let you know I resolved my issues. I used a variation of @Brent Washburne's answer. Thanks for the help!

Upvotes: 2

Views: 1162

Answers (3)

sabbahillel
sabbahillel

Reputation: 4425

Why not make the values a set of dictionaries where the ke is the bin indicator and the values a list of those items that are in that bin?

yoe would define

newd = [{bin1:[], bin2:[], ...binn:[]}, ... ]
newd[0][bin1] = (list of items in d[0] that belong in bin1)

You now have a list of dictionaries each of which has the appropriate bin listings.

newd[0] is now the equivalent of a dictionary built from d[0] each key (which I call bin1, bin2, ... binn) contains a list of the values that are appropriate for that bin. Thus we have `newd[0][bin1], newd[0][bin2, ... new[k][lastbin]

Dictionary creation allows you to create the appropriate key and value list as you go along. If there is not yet a particular bin key, create the empty list and then the append of the value to the list will succeed.

Now when you want to identify elements of a bin, you can loop through the list of newd and extract whichever bin that you want. This allows you to have bins with no entry without having to create empty lists. If a bin key is not in newd, the retrieve is set to return an empty list as a default (to avoid the dictionary invalid key exception).

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107287

You can use itertools.groupby() function by passing a proper key-function in order to categorize your items. And in this case you can use floor(x*100) as your key-function:

>>> from math import floor
>>> from itertools import groupby
>>> lst = [0.0432, 0.0336, 0.0345, 0.044, 0.0394, 0.0555]
>>> [list(g) for _,g in groupby(sorted(lst), key=lambda x: floor(x*100))]
[[0.0336, 0.0345, 0.0394], [0.0432, 0.044], [0.0555]]

And for applying this on your values you can use a dictionary comprehension:

def categorizer(val):
    return [list(g) for _,g in groupby(sorted(lst), key=lambda x: floor(x*100))]

new_dict = {k:categorizer(v) for k,v in old_dict.items()}

As another approach which is more optimized in term of execution speed you can use a dictionary for categorizing:

>>> def categorizer(val, d={}):
...     for i in val:
...         d.setdefault(floor(i*100),[]).append(i)
...     return d.values()

Upvotes: 2

Brent Washburne
Brent Washburne

Reputation: 13148

def bin(values):
    bins = [[] for _ in range(10)]    # create ten bins
    for n in values:
        b = int(n * 100)              # normalize the value to the bin number
        bins[b].append(n)             # add the number to the bin
    return bins

d =  [0.0432,
  0.0336,
  0.0345,
  0.044,
  0.0394,
  0.0555]
print bin(d)

The result is:

[[], [], [], [0.0336, 0.0345, 0.0394], [0.0432, 0.044], [0.0555], [], [], [], []]

Upvotes: 2

Related Questions