creating a new column on if condition from different columns

Question

I have the following dataset. I need Col A * B * C * D * E in a new column Score based on the conditions:

Where if ID & VID is same, but QID is unique or different, choose the higher value from Col E in calculation of the new column Score

ID     VID     QID     A     B     C     D     E
121    212     123     1     2     1     1     1
121    212     435     1     2     1     1     5
223    244     567     2     3     5     1     2
313    232     709     5     1     2     1     3
313    232     887     5     1     2     1     2
454    969     457     1     3     2     2     4
454    969     457     1     2     1     2     4

Expected outcome is something like :

ID     VID     QID     A     B     C     D     E     Score
121    212     123     1     2     1     1     1     10 (because based on ID and VID I want to choose higher value in column E while calculating, and apply the same to those rows)
121    212     435     1     2     1     1     5     10
223    244     567     2     3     5     1     2     60
313    232     709     5     1     2     1     3     30
313    232     887     5     1     2     1     2     30
454    969     457     1     3     2     2     4     48
454    969     457     1     2     1     2     4     16

I have tried .sort in order to bring Col E in a descending or ascending format and then calculating but couldn't write the logic behind the calculation. Just a beginner trying to work this problem for few days now.

frank · Accepted Answer

I might be over-complicating the solution by a bit but this is how a quick solution would look like to me:

Part 1: Calculate Unique Count of QID and Max of E

df_tempA = df.groupby(["ID", "VID"]).agg({
    'QID':"nunique",
    'E': "max"
}).reset_index().rename(columns={'QID':"QIDCount",'E':"MaxE"})

print(df_tempA)   
    ID  VID  QIDCount  MaxE
0  121  212         2     5
1  223  244         1     2
2  313  232         2     3
3  454  969         1     4

Part 2: Join with the original dataframe

df_result = pd.merge(df, df_tempA, on=["ID", "VID"])

print(df_result)
    ID  VID  QID  A  B  C  D  E  QIDCount  MaxE
0  121  212  123  1  2  1  1  1         2     5
1  121  212  435  1  2  1  1  5         2     5
2  223  244  567  2  3  5  1  2         1     2
3  313  232  709  5  1  2  1  3         2     3
4  313  232  887  5  1  2  1  2         2     3
5  454  969  457  1  3  2  2  4         1     4
6  454  969  457  1  2  1  2  4         1     4

Part 3: Create Score column based on conditions

df_result['Score'] = df_result.apply(lambda df: df['A']*df['B']*df['C']*df['D']*df['E'] 
                                    if df['QIDCount'] ==1 
                                    else df['A']*df['B']*df['C']*df['D']*df['MaxE'], 
                                    axis=1)

print(df_result)
    ID  VID  QID  A  B  C  D  E  QIDCount  MaxE  Score
0  121  212  123  1  2  1  1  1         2     5     10
1  121  212  435  1  2  1  1  5         2     5     10
2  223  244  567  2  3  5  1  2         1     2     60
3  313  232  709  5  1  2  1  3         2     3     30
4  313  232  887  5  1  2  1  2         2     3     30
5  454  969  457  1  3  2  2  4         1     4     48
6  454  969  457  1  2  1  2  4         1     4     16

creating a new column on if condition from different columns

Answers (2)

Related Questions