Remove rows in DataFrame with Groupby with different start and end dates for each id

Question

I have the following df and would like to drop the rows (date - d/m/y) before offer_date and after mature_date for each id. The final df would look like final df. Any help with this groupby would be awesome!

df

     date   offer_date mature_date  id    a    b    c   d
0    1/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
1    2/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
2    3/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
3    4/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
4    5/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
5    6/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
6    7/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
7    8/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
8    9/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
9    10/1/2000 4/1/2000   9/1/2000    1   10   20  10.0  11
10   1/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
11   2/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
12   3/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
13   4/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
14   5/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
15   6/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
16   7/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
17   8/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
18   9/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
19   10/1/2000 1/1/2000   5/1/2000    2   30   30  40.0  15
20   1/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
21   2/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
22   3/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
23   4/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
24   5/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
25   6/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
26   7/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
27   8/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
28   9/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
29   10/1/2000 2/1/2000   4/1/2000    3   33   35  40.0  15

final df

     date   offer_date mature_date  id    a    b    c   d
3    4/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
4    5/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
5    6/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
6    7/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
7    8/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
8    9/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
10   1/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
11   2/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
12   3/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
13   4/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
14   5/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
21   2/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
22   3/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
23   4/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15

Ben.T · Accepted Answer

you don't need groupby actually, you can do a global mask as all dates can be compared row wise. Convert to_datetime the columns to ensure the good comparisons.

# format for dates with day first
date_format = '%d/%m/%Y'
# create the mask with both inequality
# convert to dateitme to be ensure the selection
mask = (
    pd.to_datetime(df['date'], format=date_format)
      .ge(pd.to_datetime(df['offer_date'], format=date_format)) 
    & pd.to_datetime(df['date'], format=date_format)
        .le(pd.to_datetime(df['mature_date'], format=date_format))
)
print(df.loc[mask])
        date offer_date mature_date  id   a   b     c   d
3   4/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
4   5/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
5   6/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
6   7/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
7   8/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
8   9/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
10  1/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
11  2/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
12  3/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
13  4/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
14  5/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
21  2/1/2000   2/1/2000    4/1/2000   3  33  35  40.0  15
22  3/1/2000   2/1/2000    4/1/2000   3  33  35  40.0  15
23  4/1/2000   2/1/2000    4/1/2000   3  33  35  40.0  15

EDIT: as @Henry Ecker pointed out, in this case with a simple date format, one can use the parameter dayfirst, so using this pd.to_datetime(df['date'], dayfirst=True) for each date columns instead of defining the format would work the same.

Remove rows in DataFrame with Groupby with different start and end dates for each id

Answers (1)

Related Questions