Reputation: 2297
What I have right now looks like this: spread
0 0.00000787
1 0.00000785
2 0.00000749
3 0.00000788
4 0.00000786
5 0.00000538
6 0.00000472
7 0.00000759
And I would like to add a new column next to it, and if the value of spread in between (for example) 0 and 0.00005 then it is part of bin A, if (for example) between 0.00005 and 0.0006 then bin B (there are three bins in total). What I have tried so far:
minspread = df['spread'].min()
maxspread = df['spread'].max()
born = (float(maxspread)-float(minspread))/3
born1 = born + float(minspread)
born2 = float(maxspread) - born
df['Bin'] = df['spread'].apply(lambda x: 'A' if x < born1 else ( 'B' if born1 < x <= born2 else 'C'))
But when I do so everything ends up in the Bin A:
spread Bin
0 0.00000787 A
1 0.00000785 A
2 0.00000749 A
3 0.00000788 A
4 0.00000786 A
Does anyone have an idea on how to divide the column 'spread' in three bins (A-B-C) with the same number of observations in it? Thanks!
Upvotes: 1
Views: 2084
Reputation: 862581
If get error:
unsupported operand type(s) for +: 'decimal.Decimal' and 'float'
It means the column type is Decimal, which works poorly with pandas, and should be converted to numeric.
One possible solution is to multiply columns by some big number e.g. 10e15 and convert to integer to avoid lost precision if converting to floats and then use qcut
:
#sample data
#from decimal import Decimal
#df['spread'] = [Decimal(x) for x in df['spread']]
df['spread1'] = (df['spread'] * 10**15).astype(np.int64)
df['bins'] = pd.qcut(df['spread1'], 3, labels=list('ABC'))
print (df)
spread spread1 bins
0 0.00000787 7870000000 C
1 0.00000785 7850000000 B
2 0.00000749 7490000000 A
3 0.00000788 7880000000 C
4 0.00000786 7860000000 C
5 0.00000538 5380000000 A
6 0.00000472 4720000000 A
7 0.00000759 7590000000 B
Solution with no new column:
s = (df['spread'] * 10**15).astype(np.int64)
df['bins'] = pd.qcut(s, 3, labels=list('ABC'))
print (df)
spread bins
0 0.00000787 C
1 0.00000785 B
2 0.00000749 A
3 0.00000788 C
4 0.00000786 C
5 0.00000538 A
6 0.00000472 A
7 0.00000759 B
Upvotes: 3