Reputation: 329
I have the following dataframe :
Date Code Name Days
23/09/2019 WO Smith Thomas 1
23/09/2019 RE Johnson Robert 1
24/09/2019 WO Smith Thomas 1
24/09/2019 NW Smith Thomas 0.5
25/09/2019 OP Johnson Robert 0.5
25/09/2019 GF Johnson Robert 0.5
In case of duplicates based on the columns Date and Name, I would like to check if the sum of days is > 1. If yes then I would like to substrate the biggest value with the other but don't delete any rows, otherwise do nothing.
The desired output would be :
Date Code Name Days
23/09/2019 WO Smith Thomas 1
23/09/2019 RE Johnson Robert 1
24/09/2019 WO Smith Thomas 0.5
24/09/2019 NW Smith Thomas 0.5
25/09/2019 OP Johnson Robert 0.5
25/09/2019 GF Johnson Robert 0.5
Is there any Pandas way to do it?
I could loop using Python but it doesn't seems efficient
Thanks
Upvotes: 0
Views: 726
Reputation: 841
without more details on what the numerical operation you want to perform, the only way to get the result you want is the following:
df.Days = df.groupby(['Date', 'Name']).Days.transform(
lambda x: x if len(x) == 1 else
[max(i - min(x), min(x)) if sum(x) > 1 else i for i in x])
The reasoning behind is:
Date
, Name
basisDays
column and checks if there are duplicates Date Code Name Days
0 23/09/2019 WO Smith Thomas 1.0
1 23/09/2019 RE Johnson Robert 1.0
2 24/09/2019 WO Smith Thomas 0.5
3 24/09/2019 NW Smith Thomas 0.5
4 25/09/2019 OP Johnson Robert 0.5
5 25/09/2019 GF Johnson Robert 0.5
Upvotes: 1