Reputation: 487
My original dataset looks like the sample below:
| id | old_a | new_a | old_b | new_b | ratio_a | ratio_b |
|----|-------|-------|-------|-------|----------|---------|
| 1 | 350 | 6 | 35 | 0 | 58.33333 | Inf |
| 2 | 164 | 79 | 6 | 2 | 2.075949 | 3 |
| 3 | 10 | 0 | 1 | 1 | Inf | 1 |
| 4 | 120 | 1 | 10 | 0 | 120 | Inf |
Here's the dataframe:
df=[[1,350,6,35,0],[2,164,79,6,2],[3,10,0,1,1],[4,120,1,10,0]]
df= pd.DataFrame(df,columns=['id','old_a','new_a','old_b','new_b'])
I have obtained the column 'ratio_a' and 'ratio_b' (as shown in the table) with the following code:
df['ratio_a']= df['old_a']/df['new_a']
df['ratio_b']= df['old_b']/df['new_b']
Next I wanted to create two more columns with number ranges in which the values of ratio_a and ratio_b would fall. For this purpose, I have written the following code:
bins = [0,10,20,30,40,50,60,70,80,90,100]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)
One of the issue that I'm having is that if any value in ratio_a and ratio_b is greater than 100, it should fall in the bucket '>100'. How can I do that? My final result should look like one below:
| id | old_a | new_a | old_b | new_b | ratio_a | ratio_b | a_range | b_range |
|----|-------|-------|-------|-------|----------|---------|---------|---------|
| 1 | 350 | 6 | 35 | 0 | 58.33333 | Inf | 40-50 | NaN |
| 2 | 164 | 79 | 6 | 2 | 2.075949 | 3 | 0-10 | 0-10 |
| 3 | 10 | 0 | 1 | 1 | Inf | 1 | NaN | 0-10 |
| 4 | 120 | 1 | 10 | 0 | 120 | Inf | >100 | NaN |
Upvotes: 0
Views: 871
Reputation: 2583
One possible solution:
bins = [0,10,20,30,40,50,60,70,80,90,100,np.inf]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
labels[-1]=">100"
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)
the result:
id old_a new_a old_b new_b ratio_a ratio_b a_range b_range
1 350 6 35 0 58.333333 inf 50-60 NaN
2 164 79 6 2 2.075949 3.0 0-10 0-10
3 10 0 1 1 inf 1.0 NaN 0-10
4 120 1 10 0 120.000000 inf >100 NaN
Upvotes: 1