Selecting pandas dataframe rows depending on other rows

Question

I have a pandas dataframe with a few thousand rows and multiple columns, one of which contains IDs (integers) and one that contains dates (python datetime objects). Different rows can have identical IDs and/or dates+time. I would like to add a boolean column to the dataframe which indicates for each row if the dataframe contains another row that has an identical ID AND has a date+time that is within 2 months of - BUT NOT identical to - the date of the row itself.

See below for some example data where the "Boolean condition" column is filled correctly.

index	ID	Date	Boolean condition
1	20000242	2021-06-18 09:46:13.433	True
2	20000242	2021-06-18 09:46:13.433	True
3	20000242	2021-07-16 11:41:49.557	True
4	20000242	2021-08-13 13:36:30.417	True
5	20103342	2018-10-19 11:10:45.817	False
6	20103657	2019-09-13 10:18:00.173	True
7	20103657	2019-09-13 10:18:00.173	True
8	20103657	2019-09-13 12:16:53.737	True
9	20103657	2019-10-04 14:24:41.067	True
10	20103657	2020-03-03 13:50:32.953	False
11	20103657	2020-03-03 13:50:32.953	False
12	20103657	2020-03-03 13:50:32.953	False
13	20103657	2020-03-03 13:50:32.953	False
14	20103657	2020-03-03 13:50:32.953	False
15	20105037	2020-12-18 15:48:09.187	False
16	20201854	2021-03-05 14:49:11.850	True
17	20201854	2021-04-02 14:07:50.213	True
18	20201854	2021-05-28 08:35:57.363	True
19	20201854	2021-05-28 08:35:57.363	True
20	20203290	2020-02-07 11:13:08.117	False
21	20203290	2020-02-07 11:13:08.117	False
22	20203290	2020-02-07 11:13:08.117	False
23	20203290	2020-02-07 11:13:08.117	False
24	20203290	2020-02-07 11:13:08.117	False
25	20203290	2020-02-07 11:13:08.117	False
26	20401499	2018-07-13 11:10:02.960	True
27	20401499	2018-07-13 11:10:02.960	True
28	20401499	2018-07-13 13:46:52.483	True
29	20401499	2018-07-13 13:46:52.483	True
30	20402860	2019-12-13 11:18:31.253	False

I've tried to accomplish this naively with nested loops, but this is not feasible for thousands of rows.

Does anybody know a more efficient / pythonic way to do this?

EDIT: Here is the same data in semi-colon separated values:

index;ID;Date;Boolean condition

1;20000242;2021-06-18 09:46:13.433;True

2;20000242;2021-06-18 09:46:13.433;True

3;20000242;2021-07-16 11:41:49.557;True

4;20000242;2021-08-13 13:36:30.417;True

5;20103342;2018-10-19 11:10:45.817;False

6;20103657;2019-09-13 10:18:00.173;True

7;20103657;2019-09-13 10:18:00.173;True

8;20103657;2019-09-13 12:16:53.737;True

9;20103657;2019-10-04 14:24:41.067;True

10;20103657;2020-03-03 13:50:32.953;False

11;20103657;2020-03-03 13:50:32.953;False

12;20103657;2020-03-03 13:50:32.953;False

13;20103657;2020-03-03 13:50:32.953;False

14;20103657;2020-03-03 13:50:32.953;False

15;20105037;2020-12-18 15:48:09.187;False

16;20201854;2021-03-05 14:49:11.850;True

17;20201854;2021-04-02 14:07:50.213;True

18;20201854;2021-05-28 08:35:57.363;True

19;20201854;2021-05-28 08:35:57.363;True

20;20203290;2020-02-07 11:13:08.117;False

21;20203290;2020-02-07 11:13:08.117;False

22;20203290;2020-02-07 11:13:08.117;False

23;20203290;2020-02-07 11:13:08.117;False

24;20203290;2020-02-07 11:13:08.117;False

25;20203290;2020-02-07 11:13:08.117;False

26;20401499;2018-07-13 11:10:02.960;True

27;20401499;2018-07-13 11:10:02.960;True

28;20401499;2018-07-13 13:46:52.483;True

29;20401499;2018-07-13 13:46:52.483;True

30;20402860;2019-12-13 11:18:31.253;False

Selecting pandas dataframe rows depending on other rows

Answers (1)

Related Questions