Reputation: 37
I have the following dataframe:
import pandas as pd
import numpy as np
np.random.seed(0)
# create an array of 5 dates starting at '2015-02-24', one per minute
rng = pd.date_range('2021-07-29', periods=5, freq='D')
rng_1 = pd.date_range('2021-07-30', periods=5, freq='D')
rng_2 = pd.date_range('2021-07-31', periods=5, freq='D')
df_status = ['received', 'send', 'received', 'send', 'send']
df = pd.DataFrame({ 'Date': rng, 'Date_1': rng_1, 'Date_2': rng_2, 'status': df_status })
print(df)
I would like to print out all the rows if at least one column contains a date that is equal to, or at least 2021-08-01. What would be the most effective way to do this?
I have tried to do this with the following code, however, I get the following error:
start_date = '2022-08-01'
start_date = pd.to_datetime(start_date, format="%Y/%m/%d")
mask = (df['Date'] >= start_date | df['Date_1'] >= start_date | df['Date_3'] >= start_date)
TypeError: unsupported operand type(s) for &: 'Timestamp' and 'DatetimeArray'
Thank you in advance.
Adjusted dataframe:
df = {'sample_received': {1: nan,
2: nan,
3: '2022-08-01 20:31:24',
4: '2022-08-01 20:25:45',
5: '2022-08-01 20:41:22'},
'result_received': {1: '2022-08-01 16:25:33',
2: '2022-08-01 13:25:36',
3: '2022-08-02 09:45:34',
4: '2022-08-02 09:52:59',
5: '2022-08-02 08:22:45'},
'status': {1: 'Approved',
2: 'Approved',
3: 'Approved',
4: 'Approved',
5: 'Approved'}}
Upvotes: 0
Views: 87
Reputation: 262284
Use boolean indexing with any
:
df[df.ge('2021-08-01').any(1)]
output:
Date Date_1 Date_2
1 2021-07-30 2021-07-31 2021-08-01
2 2021-07-31 2021-08-01 2021-08-02
3 2021-08-01 2021-08-02 2021-08-03
4 2021-08-02 2021-08-03 2021-08-04
intermediate:
df.ge('2021-08-01').any(1)
0 False
1 True
2 True
3 True
4 True
dtype: bool
filtering by name (Date
in the column name):
df[df.filter(like='Date').ge('2021-08-01').any(1)]
filtering by type:
df[df.select_dtypes('datetime64[ns]').ge('2021-08-01').any(1)]
Upvotes: 2
Reputation: 3845
You may use any
inside apply
:
df[df.apply(lambda x: any([x[col] >= pd.to_datetime('2021-08-01') for col in df.columns]), axis=1)]
Upvotes: 0