Reputation: 1118
I have a column of values like below:
col
12
76
34
for which I need to generate a new column with the bucket labels for col1
as mentioned below:
col1 bucket-labels
12 8-16
76 64-128
34 32-64
Here the values in the column might vary and the number of results also.
Edit: The intervals of the bucket label should be in the range of 2^n
Upvotes: 4
Views: 566
Reputation: 13255
Using pd.cut
with 2 power
bins:
bins = [2**i for i in range(0,int(np.log2(df.col.max()))+2)]
#alternative [2**i for i in range(0,np.ceil(np.log2(df.col.max()))+1)]
bin_labels = [f'{x}-{y}' for x, y in zip(bins[:-1], bins[1:])]
df['bucket-labels'] = pd.cut(df.col, bins=bins, labels=bin_labels)
print(df)
col bucket-labels
0 12 8-16
1 76 64-128
2 34 32-64
Upvotes: 2
Reputation: 863166
First get maximal value of power 2 by one of solution from here, create bins by list comprehension, labels by zip
and pass it to cut
function:
import math
a = df['col'].max()
bins = [1<<exponent for exponent in range(math.ceil(math.log(a, 2))+1)]
#another solution
#bins = [1<<exponent for exponent in range((int(a)-1).bit_length() + 1)]
print (bins)
[1, 2, 4, 8, 16, 32, 64, 128]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
df['bucket-labels'] = pd.cut(df['col'], bins=bins, labels=labels)
print (df)
col bucket-labels
0 12 8-16
1 34 32-64
2 76 64-128
Upvotes: 7