Rahul Agarwal
Rahul Agarwal

Reputation: 4100

Compare dates and filter rows within a group with conditions

Sample DF:

ID   Name        Price     Date             Fit_Test
1    Apple         10      2018-01-15          Super_Fit
2    Apple         10      2018-01-15          Super_Fit
3    Apple         10      2019-01-15          Super_Fit

4    Orange        12      2019-02-15          Not_Fit
5    Orange        12      2018-09-25          Not_Fit
6    Orange        12      NaT                 Not_Fit
7    Orange        12      2028-01-25          Not_Fit

8    Banana        15      2019-12-25          Medium_Fit
9    Banana        15      NaT                 Medium_Fit

10   Cherry         30     2021-06-23          Super_Fit

11   PineAp         30     2023-02-03          Medium_Fit
12   PineAp         30     2020-12-13          Medium_Fit

Expected DF:

ID   Name        Price     Date             Fit_Test
1    Apple         10      2018-01-15          Super_Fit
2    Apple         10      2018-01-15          Super_Fit
3    Apple         10      2019-01-15          Super_Fit

7    Orange        12      2028-01-25          Not_Fit

8    Banana        15      2019-12-25          Medium_Fit
9    Banana        15      NaT                 Medium_Fit

10   Cherry         30     2021-06-23          Super_Fit

11   PineAp         30     2023-02-03          Medium_Fit

Problem Statement:

I want to group-by by Name and Price and then filter based on Date with Fit_Test as conditional column.

  1. If Fit_Test is Super_Fit then don't the no operation is needed. (Rows 1,2,3 and 10 is same in Input and Expected DF)

  2. If within a Name and Price conditions and Fit_Test is not Super_Fit and there is no NaT in that group then compare the dates, whichever has the highest date keep it (ID 11&12 and in Expected 12 is deleted)

  3. If within a Name and Price conditions and Fit_Test is not Super_Fit and there is a NaT in that group:

    3.1 If the count in that group is greater than 2, then compare the dates, whichever has the highest date keep it (ID 4,5,6,7 and in Expected 4,5,6 is deleted)

    3.2 If the count in that group is equal to 2, then keep both the rows ** (ID 8,9 and in Expected 8,9 is there)**

Upvotes: 1

Views: 154

Answers (1)

jezrael
jezrael

Reputation: 862471

Use:

df['Date'] = pd.to_datetime(df['Date'])

m1 = df['Fit_Test'].eq('Super_Fit').groupby([df['Name'],df['Price']]).transform('all')

m2 = df['Date'].notna().groupby([df['Name'],df['Price']]).transform('all')

m22 = df['Date'].eq(df.groupby(['Name', 'Price'])['Date'].transform('max'))

m3 = df.groupby(['Name', 'Price'])['Date'].transform('size').eq(2)

df = df[m1 | (m2 & m22) | (~m2 & m3) | (~m2 & m22)]
#it seems conditions should be simplify
#df = df[m1 | m22 | (~m2 & m3)]
print (df)
    ID    Name  Price       Date    Fit_Test
0    1   Apple     10 2018-01-15   Super_Fit
1    2   Apple     10 2018-01-15   Super_Fit
2    3   Apple     10 2019-01-15   Super_Fit
6    7  Orange     12 2028-01-25     Not_Fit
7    8  Banana     15 2019-12-25  Medium_Fit
8    9  Banana     15        NaT  Medium_Fit
9   10  Cherry     30 2021-06-23   Super_Fit
10  11  PineAp     30 2023-02-03  Medium_Fit

Upvotes: 2

Related Questions