Reputation: 109
I have the following table with fictive data:
I want to remove any duplicate rows and keep only the row which contains a positive value in "Won Turnover". Hence, the two rows marked with red should be removed in this case
Moreover, if there are duplicate rows with only Lost Turnover, then the row with the highest turnover should be kept (The bottom two rows).
Upvotes: 2
Views: 853
Reputation: 862581
First test Won Turnover
if only missing value per groups with GroupBy.all
and test only max
value per Lost Turnover
. Chain by &
for bitwise AND
and add new condition for return all not missing rows per Won Turnover
with |
for bitwise OR
:
m1 = (df.assign(new = df['Won Turnover'].isna())
.groupby(['Date','Supplier','Customer'])['new'].transform('all'))
m2 = (df.groupby(['Date','Supplier','Customer'])['Lost Turnover'].transform('max')
.eq(df['Lost Turnover']))
df = df[(m1 & m2) | df['Won Turnover'].notna()]
print (df)
Date Supplier Customer Won Turnover Lost Turnover
1 25.06.2019 Nike Pepsi 25000.0 NaN
2 25.06.2019 Nike McDonalds 10000.0 NaN
3 25.06.2019 Adidas Coca Cola 12000.0 NaN
5 25.06.2019 Adidas McDonalds 35000.0 NaN
6 25.06.2019 Adidas Pepsi NaN 15000.0
Upvotes: 0
Reputation: 27869
Maybe this can do it:
df.sort_values(['Won Turnover', 'Lost Turnover'], ascending=False).drop_duplicates('Supplier')
Upvotes: 3