oceanbeach96
oceanbeach96

Reputation: 634

Remove rows in DataFrame with Groupby with different start and end dates for each id

I have the following df and would like to drop the rows (date - d/m/y) before offer_date and after mature_date for each id. The final df would look like final df. Any help with this groupby would be awesome!

df

     date   offer_date mature_date  id    a    b    c   d
0    1/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
1    2/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
2    3/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
3    4/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
4    5/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
5    6/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
6    7/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
7    8/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
8    9/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
9    10/1/2000 4/1/2000   9/1/2000    1   10   20  10.0  11
10   1/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
11   2/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
12   3/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
13   4/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
14   5/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
15   6/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
16   7/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
17   8/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
18   9/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
19   10/1/2000 1/1/2000   5/1/2000    2   30   30  40.0  15
20   1/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
21   2/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
22   3/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
23   4/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
24   5/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
25   6/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
26   7/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
27   8/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
28   9/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
29   10/1/2000 2/1/2000   4/1/2000    3   33   35  40.0  15

final df

     date   offer_date mature_date  id    a    b    c   d
3    4/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
4    5/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
5    6/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
6    7/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
7    8/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
8    9/1/2000  4/1/2000   9/1/2000    1   10   20  10.0  11
10   1/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
11   2/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
12   3/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
13   4/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
14   5/1/2000  1/1/2000   5/1/2000    2   30   30  40.0  15
21   2/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
22   3/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15
23   4/1/2000  2/1/2000   4/1/2000    3   33   35  40.0  15

Upvotes: 0

Views: 54

Answers (1)

Ben.T
Ben.T

Reputation: 29635

you don't need groupby actually, you can do a global mask as all dates can be compared row wise. Convert to_datetime the columns to ensure the good comparisons.

# format for dates with day first
date_format = '%d/%m/%Y'
# create the mask with both inequality
# convert to dateitme to be ensure the selection
mask = (
    pd.to_datetime(df['date'], format=date_format)
      .ge(pd.to_datetime(df['offer_date'], format=date_format)) 
    & pd.to_datetime(df['date'], format=date_format)
        .le(pd.to_datetime(df['mature_date'], format=date_format))
)
print(df.loc[mask])
        date offer_date mature_date  id   a   b     c   d
3   4/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
4   5/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
5   6/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
6   7/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
7   8/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
8   9/1/2000   4/1/2000    9/1/2000   1  10  20  10.0  11
10  1/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
11  2/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
12  3/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
13  4/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
14  5/1/2000   1/1/2000    5/1/2000   2  30  30  40.0  15
21  2/1/2000   2/1/2000    4/1/2000   3  33  35  40.0  15
22  3/1/2000   2/1/2000    4/1/2000   3  33  35  40.0  15
23  4/1/2000   2/1/2000    4/1/2000   3  33  35  40.0  15

EDIT: as @Henry Ecker pointed out, in this case with a simple date format, one can use the parameter dayfirst, so using this pd.to_datetime(df['date'], dayfirst=True) for each date columns instead of defining the format would work the same.

Upvotes: 2

Related Questions