Danish
Danish

Reputation: 2871

Create score columns based on the specific range values of another columns in pandas

I have a data as shown below

ID    U_30     U_90     U_180   Limit  Type
1     -25      -10      10      100    A
2     -50      -10      -60     200    A
3     -25       0       10      50     B
4      25       10      40      80     A
5     -60       80      1000    100    A
6     -800     -10      10      600    B
7     -25      -10      10      400    B
8     -25      -10      10      300    A

From the above I would like to create U_30_Score, U_90_Score and U_180_Score as shown below.

Example for U_30_Score

if U_30 in between -100 to -90 U_30_Score = 30
if U_30 in between -90 to -80, U_30_Score = 60
if U_30 in between -80 to -70 U_30_Score = 90
if U_30 in between -70 to -60, U_30_Score = 120
if U_30 in between -60 to -50, U_30_Score = 150
if U_30 in between -50 to -40 U_30_Score = 180
if U_30 in between -40 to -30, U_30_Score = 210
if U_30 in between -30 to -20, U_30_Score = 240
if U_30 in between -20 to -10 U_30_Score = 270
if U_30 in between -10 to 0, U_30_Score = 300
if U_30 is less than -100, U_30_Score = 0
if U_30 in between 90 to 100 U_30_Score = 400
if U_30 in between 80 to 90, U_30_Score = 390
if U_30 in between 70 to 80 U_30_Score = 380
if U_30 in between 60 to 70, U_30_Score = 370
if U_30 in between 50 to 60, U_30_Score = 360
if U_30 in between 40 to 50 U_30_Score = 350
if U_30 in between 30 to 40, U_30_Score = 340
if U_30 in between 20 to 30, U_30_Score = 330
if U_30 in between 10 to 20 U_30_Score = 320
if U_30 in between 0 to 10, U_30_Score = 310
if U_30 is more than 100, U_30_Score = 410

The above condition have to be applied for U_90 and U_180 to create U_90_Score and U_180_Score respectively.

Expected Output:

ID    U_30     U_90     U_180   Limit  Type   U_30_Score    U_90_Score    U_180_Score
1     -25      -10      10      100    A      240           300           320
2     -50      -10      -60     200    A      180           300           150
3     -25       0       10      50     B      240           310           320           
4      25       10      40      80     A      330           320           350
5     -60       80      1000    100    A      150           390           410
6     -800     -10      10      600    B      0             300           320
7     -25      -10      10      400    B      240           300           320
8     -25      -10      10      300    A      240           300           320

Upvotes: 0

Views: 169

Answers (3)

Jeff
Jeff

Reputation: 634

I would suggest np.digitize that automatically bins your data into a bin then multiply your score scaling from there.

func = np.vectorize(lambda x: x*30 if x<11 else x*10+200)
func(
    np.digitize(
            [-25,-50,-25,25,-60,-800,-25,-25], #input column here
            np.concatenate( #define your variably sized bins
                [
                    [-float('inf')],
                    np.arange(-100,110,10),
                    [float('inf')]
                ]
            )
    )-1
).clip(max=410)

output:

array([240, 180, 240, 330, 150,   0, 240, 240], dtype=int64)

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use np.r_ to create bins and labels then use pd.cut to categorize the specified columns based on these bins:

from functools import partial

f = partial(pd.cut, bins=np.r_[-np.inf, -100:110:10, np.inf], 
            labels=np.r_[0:330:30, 310:420:10], right=False)

c = df.filter(like='U_').columns
df[c + '_Score'] = df[c].apply(f)

Result:

   ID  U_30  U_90  U_180  Limit Type U_30_Score U_90_Score U_180_Score
0   1   -25   -10     10    100    A        240        300         320
1   2   -50   -10    -60    200    A        180        300         150
2   3   -25     0     10     50    B        240        310         320
3   4    25    10     40     80    A        330        320         350
4   5   -60    80   1000    100    A        150        390         410
5   6  -800   -10     10    600    B          0        300         320
6   7   -25   -10     10    400    B        240        300         320
7   8   -25   -10     10    300    A        240        300         320

Upvotes: 4

YangSeungwon
YangSeungwon

Reputation: 84

if U_30 < 0:
    U_30_Score = max(300 + 30*(U_30//10+1), 0)
else:
    U_30_Score = min(310 + 10*(U_30//10), 410)

This could work because your score distribution has some kind of rule. max and min function makes it doesn't go to out of bound.

EDIT: output(test)

>>> def func(U_30):
...     if U_30 < 0:
...         U_30_Score = max(300 + 30*(U_30//10+1), 0)
...     else:
...         U_30_Score = min(310 + 10*(U_30//10), 410)
...     print(f"U_30 : {U_30}, U_30_Score : {U_30_Score}")
... 
>>> for num in range(-110, 110, 5):
...     func(num)
... 
U_30 : -110, U_30_Score : 0
U_30 : -105, U_30_Score : 0
U_30 : -100, U_30_Score : 30
U_30 : -95, U_30_Score : 30
U_30 : -90, U_30_Score : 60
U_30 : -85, U_30_Score : 60
U_30 : -80, U_30_Score : 90
U_30 : -75, U_30_Score : 90
U_30 : -70, U_30_Score : 120
U_30 : -65, U_30_Score : 120
U_30 : -60, U_30_Score : 150
U_30 : -55, U_30_Score : 150
U_30 : -50, U_30_Score : 180
U_30 : -45, U_30_Score : 180
U_30 : -40, U_30_Score : 210
U_30 : -35, U_30_Score : 210
U_30 : -30, U_30_Score : 240
U_30 : -25, U_30_Score : 240
U_30 : -20, U_30_Score : 270
U_30 : -15, U_30_Score : 270
U_30 : -10, U_30_Score : 300
U_30 : -5, U_30_Score : 300
U_30 : 0, U_30_Score : 310
U_30 : 5, U_30_Score : 310
U_30 : 10, U_30_Score : 320
U_30 : 15, U_30_Score : 320
U_30 : 20, U_30_Score : 330
U_30 : 25, U_30_Score : 330
U_30 : 30, U_30_Score : 340
U_30 : 35, U_30_Score : 340
U_30 : 40, U_30_Score : 350
U_30 : 45, U_30_Score : 350
U_30 : 50, U_30_Score : 360
U_30 : 55, U_30_Score : 360
U_30 : 60, U_30_Score : 370
U_30 : 65, U_30_Score : 370
U_30 : 70, U_30_Score : 380
U_30 : 75, U_30_Score : 380
U_30 : 80, U_30_Score : 390
U_30 : 85, U_30_Score : 390
U_30 : 90, U_30_Score : 400
U_30 : 95, U_30_Score : 400
U_30 : 100, U_30_Score : 410
U_30 : 105, U_30_Score : 410

Upvotes: 1

Related Questions