Reputation: 15
I have a pandas dataframe in which I have three datetime columns. One of them is a log date and the other two (let say start and end) are used to define a datetime interval. What I want to do is to drop records if they are conflicting in datetime interval. I would like to keep the record with newest log time and drop the rest.
For example:
name | start | end | log time |
---|---|---|---|
r1 | 2022/08/05 09:00:00 | 2022/08/07 08:00:00 | 2022/08/06 08:00:00 |
r2 | 2022/08/06 09:00:00 | 2022/08/08 08:00:00 | 2022/08/05 08:00:00 |
r3 | 2022/08/07 09:00:00 | 2022/08/09 08:00:00 | 2022/08/04 08:00:00 |
In this table r1 and r2 are conflicting in intervals, hence I would like to keep r1 as it has the newest log time between the two.
So I would like to get the following result.
name | start | end | log time |
---|---|---|---|
r1 | 2022/08/05 09:00:00 | 2022/08/07 08:00:00 | 2022/08/06 08:00:00 |
r3 | 2022/08/07 09:00:00 | 2022/08/09 08:00:00 | 2022/08/04 08:00:00 |
Upvotes: 0
Views: 103
Reputation: 332
Below line will get the index which has starttime is grater than logtime and drop the identified indexes.
df.drop(df[ (df['start'] > df['logtime'])].index, inplace=True)
Upvotes: 1