Reputation: 4100
Sample DF:
ID Name Price Date Fit_Test
1 Apple 10 2018-01-15 Super_Fit
2 Apple 10 2018-01-15 Super_Fit
3 Apple 10 2019-01-15 Super_Fit
4 Orange 12 2019-02-15 Not_Fit
5 Orange 12 2018-09-25 Not_Fit
6 Orange 12 NaT Not_Fit
7 Orange 12 2028-01-25 Not_Fit
8 Banana 15 2019-12-25 Medium_Fit
9 Banana 15 NaT Medium_Fit
10 Cherry 30 2021-06-23 Super_Fit
11 PineAp 30 2023-02-03 Medium_Fit
12 PineAp 30 2020-12-13 Medium_Fit
Expected DF:
ID Name Price Date Fit_Test
1 Apple 10 2018-01-15 Super_Fit
2 Apple 10 2018-01-15 Super_Fit
3 Apple 10 2019-01-15 Super_Fit
7 Orange 12 2028-01-25 Not_Fit
8 Banana 15 2019-12-25 Medium_Fit
9 Banana 15 NaT Medium_Fit
10 Cherry 30 2021-06-23 Super_Fit
11 PineAp 30 2023-02-03 Medium_Fit
Problem Statement:
I want to group-by
by Name
and Price
and then filter based on Date
with Fit_Test
as conditional column.
If Fit_Test is Super_Fit
then don't the no operation is needed. (Rows 1,2,3 and 10 is same in Input and Expected DF)
If within a Name
and Price
conditions and Fit_Test is not Super_Fit
and there is no NaT
in that group then compare the dates, whichever has the highest date keep it (ID 11&12 and in Expected 12 is deleted)
If within a Name
and Price
conditions and Fit_Test is not Super_Fit
and there is a NaT
in that group:
3.1 If the count in that group is greater than 2, then compare the dates, whichever has the highest date keep it (ID 4,5,6,7 and in Expected 4,5,6 is deleted)
3.2 If the count in that group is equal to 2, then keep both the rows ** (ID 8,9 and in Expected 8,9 is there)**
Upvotes: 1
Views: 154
Reputation: 862471
Use:
df['Date'] = pd.to_datetime(df['Date'])
m1 = df['Fit_Test'].eq('Super_Fit').groupby([df['Name'],df['Price']]).transform('all')
m2 = df['Date'].notna().groupby([df['Name'],df['Price']]).transform('all')
m22 = df['Date'].eq(df.groupby(['Name', 'Price'])['Date'].transform('max'))
m3 = df.groupby(['Name', 'Price'])['Date'].transform('size').eq(2)
df = df[m1 | (m2 & m22) | (~m2 & m3) | (~m2 & m22)]
#it seems conditions should be simplify
#df = df[m1 | m22 | (~m2 & m3)]
print (df)
ID Name Price Date Fit_Test
0 1 Apple 10 2018-01-15 Super_Fit
1 2 Apple 10 2018-01-15 Super_Fit
2 3 Apple 10 2019-01-15 Super_Fit
6 7 Orange 12 2028-01-25 Not_Fit
7 8 Banana 15 2019-12-25 Medium_Fit
8 9 Banana 15 NaT Medium_Fit
9 10 Cherry 30 2021-06-23 Super_Fit
10 11 PineAp 30 2023-02-03 Medium_Fit
Upvotes: 2