Reputation: 2297
My data looks like this:
spread CPB% Bin
0 0.00000787 0.001270648030495552731893265565 B
1 0.00000785 0.003821656050955414012738853503 A
2 0.00000749 0.005821656050955414012738853503 C
3 0.00000788 0.004821656050955414012738853503 B
So I have basically assigned a letter A,B or C according to the value of their spread. I have done this using this code:
s = (df['spread'] * 10**15).astype(np.int64)
df['Bin'] = pd.qcut(s, 3, labels=list('ABC'))
What I need to do now, is that I have 100 spreads (from 0.000001 to 0.0001) and I need to know if they end-up in the Bin A,B or C. Is there a way to find let's say the 'range' of the above quantile?
More precisely I have the below spreads:
spread
0 0.000100
1 0.000109
2 0.000118
3 0.000127
4 0.000136
5 0.000145
How can I know if they end-up in the same bin as A-B-C of above? Thanks
Upvotes: 1
Views: 2723
Reputation: 71
If you use:
df['bins'] = pd.qcut(df['your_split_col_name'], 3)
The output will tell you the bin intervals.
Using the labels masks that.
EDIT
To use the created split, you could use pandas groupby function.
df['bins'] = pd.qcut(df['your_split_col_name'], 3)
df = df.groupby('bins')
df.describe()
Upvotes: 1
Reputation: 862791
I believe you need add parameter retbins=True
for qcut
for return intervals, so is possible reuse it in cut
:
print (df1)
spread CPB% Bin
0 0.000008 0.001271 B
1 0.000008 0.003822 A
2 0.000007 0.005822 C
3 0.000008 0.004822 B
print (df2)
spread
0 0.000008 <-change data sample for match
1 0.000109
2 0.000118
3 0.000127
4 0.000136
5 0.000145
s = (df1['spread'] * 10**15).astype(np.int64)
v,b = pd.qcut(s, 3, labels=list('ABC'),retbins=True)
print (v)
0 B
1 A
2 A
3 C
Name: spread, dtype: category
Categories (3, object): [A < B < C]
print (b)
[7490000000 7849999999 7869999999 7880000000]
s1 = (df2['spread'] * 10**15).astype(np.int64)
df2['new'] = pd.cut(s1, bins=b, labels=v.cat.categories)
print (df2)
spread new
0 0.000008 A
1 0.000109 NaN
2 0.000118 NaN
3 0.000127 NaN
4 0.000136 NaN
5 0.000145 NaN
Upvotes: 2