fjurt
fjurt

Reputation: 793

Relative weight within row with max value in python DataFrame

I try to calculate the relative weights of df1 in each row with a max value of 0.5. So far, I was able to calculate the relative weights in df2 but without an upper boundary. Here would be a simple example:

import pandas as pd
df1 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    'ID1':[0,0,2,1,1], 
    'ID2':[1,3,1,1,2],
    'ID3':[1,0,0,1,0],
    'ID4':[1,1,7,1,0],
    'ID5':[0,6,0,0,1]})

df1:
    Dates       ID1 ID2 ID3 ID4 ID5
0   2021-01-01  0   1   1   1   0
1   2021-01-02  0   3   0   1   6
2   2021-01-03  2   1   0   7   0
3   2021-01-04  1   1   1   1   0
4   2021-01-05  1   2   0   0   1
df1 = df1.set_index('Dates').T
df2 = df1.transform(lambda x: x/sum(x)).T
df2.round(2)

df2:            
Dates       ID1     ID2     ID3     ID4     ID5         
2021-01-01  0.00    0.33    0.33    0.33    0.00
2021-01-02  0.00    0.30    0.00    0.10    0.60
2021-01-03  0.20    0.10    0.00    0.70    0.00
2021-01-04  0.25    0.25    0.25    0.25    0.00
2021-01-05  0.25    0.50    0.00    0.00    0.25

I try to get df3 with a relative weight maximum of 0.5.

df3:            
Dates       ID1     ID2     ID3     ID4     ID5         
2021-01-01  0.00    0.33    0.33    0.33    0.00
2021-01-02  0.00    0.30    0.00    0.10    0.50
2021-01-03  0.20    0.10    0.00    0.50    0.00
2021-01-04  0.25    0.25    0.25    0.25    0.00
2021-01-05  0.25    0.50    0.00    0.00    0.25

When I use the following adjusted function, I get the error: Transform function failed

df1.transform(lambda x: x/sum(x) if x/sum(x) < 0.5 else 0.5).T

Thanks a lot!

Upvotes: 1

Views: 216

Answers (3)

Krishnakanth Allika
Krishnakanth Allika

Reputation: 790

Instead of transposing and applying transformations on each element, we can manipulate rows directly.

df3 = df1.copy().set_index('Dates')
df3 = df3.div(df3.sum(axis=1), axis=0).clip(upper=0.5).round(2).reset_index()

Output:

        Dates   ID1   ID2   ID3   ID4   ID5
0  2021-01-01  0.00  0.33  0.33  0.33  0.00
1  2021-01-02  0.00  0.30  0.00  0.10  0.50
2  2021-01-03  0.20  0.10  0.00  0.50  0.00
3  2021-01-04  0.25  0.25  0.25  0.25  0.00
4  2021-01-05  0.25  0.50  0.00  0.00  0.25

Would this work for you?

Upvotes: 1

Danila Musaev
Danila Musaev

Reputation: 143

for col in df1.columns:
   df1[col] = df1[col].apply(lambda x: x/sum(df1[col]) if x/sum(df1[col]) < 0.5 else 0.5)

Have fun!

Upvotes: 1

nikeros
nikeros

Reputation: 3379

You can use apply(...,axis=1) and clip the values with a max of 0.5 (this assumes Date is always the first columns - alternatively, we could set it as an index):

df1[df1.columns[1:]] = df1[df1.columns[1:]].apply(lambda x:x/sum(x), axis=1).clip(upper=0.5)

Upvotes: 1

Related Questions