Reputation: 1477
I would like to perform a comparison between the two dates (One from a pandas dataframe) in python3, another one is calculated. I would like to filter pandas dataframe if the values in the 'Publication_date' is equal to or less than the today's date and is greater than the date 10 years ago.
The pandas df looks like this:
PMID Publication_date
0 31611796 2019-09-27
1 33348808 2020-12-17
2 12089324 2002-06-27
3 31028872 2019-04-25
4 26805781 2016-01-21
I am doing the comparison as shown below.
df[(df['Publication_date']> datetime.date.today() - datetime.timedelta(days=3650)) &
(df['Publication_date']<= datetime.date.today())]
Above date filter when applied on the df should not give Row:3 of the df.
'Publication_date' column has type 'string'. I converted it to date using below line in my script.
df_phenotype['publication_date']= pd.to_datetime(df_phenotype['publication_date'])
But it changes the column type to 'datetime64[ns]' that makes the comparison incompatible between 'datetime64[ns]' and datetime.
How can I perform this comparison?
Any help is highly appreciated.
Upvotes: 1
Views: 591
Reputation: 862641
You can use only pandas for working with datetimes - Timestamp.floor
is for remove times from datetimes (set times to 00:00:00
):
df['Publication_date']= pd.to_datetime(df['Publication_date'])
today = pd.to_datetime('now').floor('d')
df1 = df[(df['Publication_date']> today - pd.Timedelta(days=3650)) &
(df['Publication_date']<= today)]
Also you can use 10 years
offset:
today = pd.to_datetime('now').floor('d')
df1 = df[(df['Publication_date']> today - pd.offsets.DateOffset(years=10)) &
(df['Publication_date']<= today)]
print (df1)
PMID Publication_date
0 31611796 2019-09-27
1 33348808 2020-12-17
3 31028872 2019-04-25
4 26805781 2016-01-21
Upvotes: 1