Reputation: 120
What is the most efficient way to filter a pandas dataframe based on a difference between two date columns?
For example, based on the following dataframe:
CADASTRO RESPOSTA EVAL
0 2021-06-01 2021-06-13 y
1 2021-06-01 2021-06-13 y
2 2021-06-01 2021-06-18 y
3 2021-06-01 2021-06-09 n
4 2021-06-01 2021-06-20 n
5 2021-06-01 2021-06-20 n
How can I filter it to have only the records in which the difference between the columns RESPOSTA
and CADASTRO
is lower than 15 days? I tried the following, unsucessfully:
import datetime
filtered_df = df[(df.RESPOSTA - df.CADASTRO).days < 15]
The desired output is:
CADASTRO RESPOSTA EVAL
0 2021-06-01 2021-06-13 y
1 2021-06-01 2021-06-13 y
3 2021-06-01 2021-06-09 n
Upvotes: 1
Views: 361
Reputation: 5597
Use timedelta
to compare the days difference
from datetime import timedelta
df = pd.DataFrame({
'CADASTRO': ['2021-06-01', '2021-06-01', '2021-06-01', '2021-06-01', '2021-06-01', '2021-06-01'],
'RESPOSTA': ['2021-06-13', '2021-06-13', '2021-06-18', '2021-06-09', '2021-06-20', '2021-06-20'],
'EVAL': ['y', 'y', 'y', 'n', 'n', 'n']
})
df['CADASTRO'] = pd.to_datetime(df['CADASTRO'])
df['RESPOSTA'] = pd.to_datetime(df['RESPOSTA'])
df['temp'] = df['RESPOSTA'] - df['CADASTRO']
df['temp'] = df['temp'].apply(lambda x: 0 if x < timedelta(days=15) else 1)
filtered_df = df.drop(df[df['temp']==0].index).drop(columns=['temp'])
Output filtered_df
Upvotes: 1
Reputation: 35626
Access days through the datetime accessor dt
# Ensure DateTime
df['CADASTRO'] = pd.to_datetime(df['CADASTRO'])
df['RESPOSTA'] = pd.to_datetime(df['RESPOSTA'])
# Access Days through dt.days
filtered_df = df[(df.RESPOSTA - df.CADASTRO).dt.days < 15]
filtered_df
:
CADASTRO RESPOSTA EVAL
0 2021-06-01 2021-06-13 y
1 2021-06-01 2021-06-13 y
3 2021-06-01 2021-06-09 n
Upvotes: 5