Reputation: 793
I try to calculate the relative weights of df1
in each row with a max value of 0.5. So far, I was able to calculate the relative weights in df2
but without an upper boundary. Here would be a simple example:
import pandas as pd
df1 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'ID1':[0,0,2,1,1],
'ID2':[1,3,1,1,2],
'ID3':[1,0,0,1,0],
'ID4':[1,1,7,1,0],
'ID5':[0,6,0,0,1]})
df1:
Dates ID1 ID2 ID3 ID4 ID5
0 2021-01-01 0 1 1 1 0
1 2021-01-02 0 3 0 1 6
2 2021-01-03 2 1 0 7 0
3 2021-01-04 1 1 1 1 0
4 2021-01-05 1 2 0 0 1
df1 = df1.set_index('Dates').T
df2 = df1.transform(lambda x: x/sum(x)).T
df2.round(2)
df2:
Dates ID1 ID2 ID3 ID4 ID5
2021-01-01 0.00 0.33 0.33 0.33 0.00
2021-01-02 0.00 0.30 0.00 0.10 0.60
2021-01-03 0.20 0.10 0.00 0.70 0.00
2021-01-04 0.25 0.25 0.25 0.25 0.00
2021-01-05 0.25 0.50 0.00 0.00 0.25
I try to get df3
with a relative weight maximum of 0.5.
df3:
Dates ID1 ID2 ID3 ID4 ID5
2021-01-01 0.00 0.33 0.33 0.33 0.00
2021-01-02 0.00 0.30 0.00 0.10 0.50
2021-01-03 0.20 0.10 0.00 0.50 0.00
2021-01-04 0.25 0.25 0.25 0.25 0.00
2021-01-05 0.25 0.50 0.00 0.00 0.25
When I use the following adjusted function, I get the error: Transform function failed
df1.transform(lambda x: x/sum(x) if x/sum(x) < 0.5 else 0.5).T
Thanks a lot!
Upvotes: 1
Views: 216
Reputation: 790
Instead of transposing and applying transformations on each element, we can manipulate rows directly.
df3 = df1.copy().set_index('Dates')
df3 = df3.div(df3.sum(axis=1), axis=0).clip(upper=0.5).round(2).reset_index()
Output:
Dates ID1 ID2 ID3 ID4 ID5
0 2021-01-01 0.00 0.33 0.33 0.33 0.00
1 2021-01-02 0.00 0.30 0.00 0.10 0.50
2 2021-01-03 0.20 0.10 0.00 0.50 0.00
3 2021-01-04 0.25 0.25 0.25 0.25 0.00
4 2021-01-05 0.25 0.50 0.00 0.00 0.25
Would this work for you?
Upvotes: 1
Reputation: 143
for col in df1.columns:
df1[col] = df1[col].apply(lambda x: x/sum(df1[col]) if x/sum(df1[col]) < 0.5 else 0.5)
Have fun!
Upvotes: 1
Reputation: 3379
You can use apply(...,axis=1)
and clip
the values with a max of 0.5 (this assumes Date
is always the first columns - alternatively, we could set it as an index):
df1[df1.columns[1:]] = df1[df1.columns[1:]].apply(lambda x:x/sum(x), axis=1).clip(upper=0.5)
Upvotes: 1