Reputation: 169
I am a newcomer to Python and I am needing a solution for this example below, This is how my df looks like,
I need to return just two records one for ID_Number 207921021 and one for ID_Number 222037001. Using the below code I was able to get one record for ID_Number 222037001 based on max value of Score 1 and Score 2 when the Action is the same ('Invest' in this case),
Data=Data.groupby(['ID_Number','Action'])['Score_1','Score_2'].max().reset_index()
Below was my result,
Is there way to just to get one record for ID_Number 207921021 as well (Two separate values in the action column this time i.e. 'Use' and 'Avoid')? I have used Window Functions in SQL i.e using QUALIFY RANK() OVER (PARTITION BY ID_number, Action ORDER BY Score_1,Score_2 DESC)=1
to get this done previously.
Is there a similar function in Python to do this?
Upvotes: 1
Views: 1348
Reputation: 3288
This will pick one row per ID_Number
with with sorting you defined.
df.sort_values(by=['Score_2', 'Score_1'], ascending=[False, True]).groupby(['ID_Number']).head(1)
Output:
Action ID_Number Score_1 Score_2
3 Invest 222037001 9 0.4600
0 Use 207821021 7 0.4525
Upvotes: 2