Reputation: 11204
I often want to bucket an unordered collection in python. itertools.groubpy
does the right sort of thing but almost always requires massaging to sort the items first and catch the iterators before they're consumed.
Is there any quick way to get this behavior, either through a standard python module or a simple python idiom?
>>> bucket('thequickbrownfoxjumpsoverthelazydog', lambda x: x in 'aeiou')
{False: ['t', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p',
's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g'],
True: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o']}
>>> bucket(xrange(21), lambda x: x % 10)
{0: [0, 10, 20],
1: [1, 11],
2: [2, 12],
3: [3, 13],
4: [4, 14],
5: [5, 15],
6: [6, 16],
7: [7, 17],
8: [8, 18],
9: [9, 19]}
Upvotes: 19
Views: 37298
Reputation: 633
You can use generator like this "yield" means, that function returns on every go through
array.pop cut last element from array
def get_bucket(arr, n):
while len(arr) > 0:
l = len(arr)
bucket = []
for i in range(n if n < l else l):
bucket.append(arr.pop())
yield bucket
for b in get_bucket(list(range(123)), 5):
print(b)
[122, 121, 120, 119, 118]
[117, 116, 115, 114, 113]
[112, 111, 110, 109, 108]
[107, 106, 105, 104, 103]
[102, 101, 100, 99, 98]
[97, 96, 95, 94, 93]
[92, 91, 90, 89, 88]
[87, 86, 85, 84, 83]
[82, 81, 80, 79, 78]
[77, 76, 75, 74, 73]
[72, 71, 70, 69, 68]
[67, 66, 65, 64, 63]
[62, 61, 60, 59, 58]
[57, 56, 55, 54, 53]
[52, 51, 50, 49, 48]
[47, 46, 45, 44, 43]
[42, 41, 40, 39, 38]
[37, 36, 35, 34, 33]
[32, 31, 30, 29, 28]
[27, 26, 25, 24, 23]
[22, 21, 20, 19, 18]
[17, 16, 15, 14, 13]
[12, 11, 10, 9, 8]
[7, 6, 5, 4, 3]
[2, 1, 0]
Upvotes: 0
Reputation: 7752
If its a pandas.DataFrame
the following also works, utilizing pd.cut()
from sklearn import datasets
import pandas as pd
# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0]) # we'll just take the first feature
# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)
yielding
0 0_1
1 0_0
2 0_0
[...]
In case you don't set the labels
, the output is going to like this
0 (5.02, 5.74]
1 (4.296, 5.02]
2 (4.296, 5.02]
[...]
Upvotes: 2
Reputation: 2348
Here's a variant of partition()
from above when the predicate is boolean, avoiding the cost of a dict
/defaultdict
:
def boolpartition(seq, pred):
passing, failing = [], []
for item in seq:
(passing if pred(item) else failing).append(item)
return passing, failing
Example usage:
>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]
Upvotes: 2
Reputation: 353359
This has come up several times before -- (1), (2), (3) -- and there's a partition recipe in the itertools
recipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate
, so who knows what's lurking there these days? :^)
When I need this behaviour, I use
from collections import defaultdict
def partition(seq, key):
d = defaultdict(list)
for x in seq:
d[key(x)].append(x)
return d
and get on with my day.
Upvotes: 23
Reputation: 3009
Edit:
Using DSM's answer as a start, here is a slightly more concise, general answer:
d = defaultdict(list)
map(lambda x: d[x in 'aeiou'].append(x),'thequickbrownfoxjumpsoverthelazydog')
or
d = defaultdict(list)
map(lambda x: d[x %10].append(x),xrange(21))
#
Here is a two liner:
d = {False:[],True:[]}
filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")
Which can of course be made a one-liner:
d = {False:[],True:[]};filter(lambda x: d[True].append(x) if x in 'aeiou' else d[False].append(x),"thequickbrownfoxjumpedoverthelazydogs")
Upvotes: -1
Reputation: 13508
Here is a simple two liner
d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)
Edit:
Just adding your other case for completeness.
d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)
Upvotes: 6