Reputation: 2127
I have a dataframe that looks like:
Publication Date Date Value State
0 2018-12-12 2018-12-11 2.0 AL
1 2018-12-12 2018-12-11 1.0 AZ
2 2018-12-12 2018-12-11 1.0 AK
3 2018-12-12 2018-12-11 2.0 NJ
4 2018-12-12 2018-12-11 1.5 NY
5 2018-12-12 2018-12-12 12.0 AL
6 2018-12-12 2018-12-12 13.0 AZ
7 2018-12-12 2018-12-12 14.0 AK
8 2018-12-12 2018-12-12 12.0 NJ
9 2018-12-12 2018-12-12 11.5 NY
10 2018-12-13 2018-12-12 2.0 AL
11 2018-12-13 2018-12-12 3.0 AZ
12 2018-12-13 2018-12-12 6.0 AK
13 2018-12-13 2018-12-12 1.0 NJ
14 2018-12-13 2018-12-12 4.0 NY
I want to delete every occurrence where Publication date
= Date + 1
so when Publication Date
= 2018-12-12
and Date
= 2018-12-11
all of those values are dropped.
It would look like:
Publication Date Date Value State
2018-12-12 2018-12-12 12.0 AL
2018-12-12 2018-12-12 13.0 AZ
2018-12-12 2018-12-12 14.0 AK
2018-12-12 2018-12-12 12.0 NJ
2018-12-12 2018-12-12 11.5 NY
What's the easiest way to do this?
Upvotes: 3
Views: 694
Reputation: 403128
Get a mask, use it to filter df
:
m = df['Publication Date'].eq(df['Date'] + pd.Timedelta(1, unit='D'))
df[~m]
Publication Date Date Value State
5 2018-12-12 2018-12-12 12.0 AL
6 2018-12-12 2018-12-12 13.0 AZ
7 2018-12-12 2018-12-12 14.0 AK
8 2018-12-12 2018-12-12 12.0 NJ
9 2018-12-12 2018-12-12 11.5 NY
If "Publication Date" and "Date" are not date time columns, use pd.to_datetime(..., errors='coerce')
to coerce them first.
Upvotes: 4
Reputation: 588
For this, please import timedelta
from datetime import timedelta
First convert datatype of date column to datetime.
df.Date = pd.to_datetime(df.Date)
df["Publication Date"] = pd.to_datetime(df["Publication Date"])
Then filter dataframe
df = df[df["Publication Date"] != df.date + timedelta(days=1)];
Upvotes: 0