Reputation: 31
I have the following dataset. I need Col A * B * C * D * E
in a new column Score
based on the conditions:
Where if ID
& VID
is same, but QID
is unique or different, choose the higher value from Col E
in calculation of the new column Score
ID VID QID A B C D E
121 212 123 1 2 1 1 1
121 212 435 1 2 1 1 5
223 244 567 2 3 5 1 2
313 232 709 5 1 2 1 3
313 232 887 5 1 2 1 2
454 969 457 1 3 2 2 4
454 969 457 1 2 1 2 4
Expected outcome is something like :
ID VID QID A B C D E Score
121 212 123 1 2 1 1 1 10 (because based on ID and VID I want to choose higher value in column E while calculating, and apply the same to those rows)
121 212 435 1 2 1 1 5 10
223 244 567 2 3 5 1 2 60
313 232 709 5 1 2 1 3 30
313 232 887 5 1 2 1 2 30
454 969 457 1 3 2 2 4 48
454 969 457 1 2 1 2 4 16
I have tried .sort
in order to bring Col E
in a descending or ascending format and then calculating but couldn't write the logic behind the calculation. Just a beginner trying to work this problem for few days now.
Upvotes: 0
Views: 70
Reputation: 862511
Use DataFrame.transform
for overwite column E
by max values per groups and then multiple by DataFrame.prod
all columns with omit first 3 selected DataFrame.iloc
:
df['Score'] = (df.assign(E = df.groupby(['ID','VID'])['E'].transform('max'))
.iloc[:, 3:]
.prod(axis=1))
print (df)
ID VID QID A B C D E Score
0 121 212 123 1 2 1 1 1 10
1 121 212 435 1 2 1 1 5 10
2 223 244 567 2 3 5 1 2 60
3 313 232 709 5 1 2 1 3 30
4 313 232 887 5 1 2 1 2 30
5 454 969 457 1 3 2 2 4 48
6 454 969 457 1 2 1 2 4 16
Details:
print (df.groupby(['ID','VID'])['E'].transform('max'))
0 5
1 5
2 2
3 3
4 3
5 4
6 4
Name: E, dtype: int64
Upvotes: 1
Reputation: 379
I might be over-complicating the solution by a bit but this is how a quick solution would look like to me:
Part 1: Calculate Unique Count of QID and Max of E
df_tempA = df.groupby(["ID", "VID"]).agg({
'QID':"nunique",
'E': "max"
}).reset_index().rename(columns={'QID':"QIDCount",'E':"MaxE"})
print(df_tempA)
ID VID QIDCount MaxE
0 121 212 2 5
1 223 244 1 2
2 313 232 2 3
3 454 969 1 4
Part 2: Join with the original dataframe
df_result = pd.merge(df, df_tempA, on=["ID", "VID"])
print(df_result)
ID VID QID A B C D E QIDCount MaxE
0 121 212 123 1 2 1 1 1 2 5
1 121 212 435 1 2 1 1 5 2 5
2 223 244 567 2 3 5 1 2 1 2
3 313 232 709 5 1 2 1 3 2 3
4 313 232 887 5 1 2 1 2 2 3
5 454 969 457 1 3 2 2 4 1 4
6 454 969 457 1 2 1 2 4 1 4
Part 3: Create Score column based on conditions
df_result['Score'] = df_result.apply(lambda df: df['A']*df['B']*df['C']*df['D']*df['E']
if df['QIDCount'] ==1
else df['A']*df['B']*df['C']*df['D']*df['MaxE'],
axis=1)
print(df_result)
ID VID QID A B C D E QIDCount MaxE Score
0 121 212 123 1 2 1 1 1 2 5 10
1 121 212 435 1 2 1 1 5 2 5 10
2 223 244 567 2 3 5 1 2 1 2 60
3 313 232 709 5 1 2 1 3 2 3 30
4 313 232 887 5 1 2 1 2 2 3 30
5 454 969 457 1 3 2 2 4 1 4 48
6 454 969 457 1 2 1 2 4 1 4 16
Upvotes: 2