Vipool
Vipool

Reputation: 108

How to find rows of a dataframe containing a date value?

There is a huge dataframe containing multiple data types in different columns. I want to find rows that contain date values in different columns. Here a test dataframe:

   dt = pd.Series(['abc', datetime.now(), 12, '', None, np.nan, '2020-05-05'])
   dt1 = pd.Series([3, datetime.now(), 'sam', '', np.nan, 'abc-123', '2020-05-25'])
   dt3 = pd.Series([1,2,3,4,5,6,7])
   df = pd.DataFrame({"A":dt.values, "B":dt1.values, "C":dt3.values})

Now, I want to create a new dataframe that contains only dates in both columns A and B, here rows 2nd and last.
Expected output:

                            A                           B  C
1  2020-06-01 16:58:17.274311  2020-06-01 17:13:20.391394  2
6                  2020-05-05                  2020-05-25  7

What is the best way to do that? Thanks.

P.S.> Dates can be in any standard format.

Upvotes: 0

Views: 812

Answers (3)

jezrael
jezrael

Reputation: 862581

Solution for test only A,B columns is boolean indexing with DataFrame.notna and DataFrame.all for not match any non datetimes:

df = df[df[['A','B']].apply(pd.to_datetime, errors='coerce').notna().all(axis=1)]

print (df)
                            A                           B  C
1  2020-06-01 16:14:35.020855  2020-06-01 16:14:35.021855  2
6                  2020-05-05                  2020-05-25  7

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use:

m = df[['A', 'B']].transform(pd.to_datetime, errors='coerce').isna().any(axis=1)   
df = df[~m]

Result:

# print(df)
                            A                           B  C
1  2020-06-01 17:54:16.377722  2020-06-01 17:54:16.378432  2
6                  2020-05-05                  2020-05-25  7

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195418

import pandas as pd
from datetime import datetime

dt = pd.Series(['abc', datetime.now(), 12, '', None, np.nan, '2020-05-05'])
dt1 = pd.Series([3, datetime.now(), 'sam', '', np.nan, 'abc-123', '2020-05-25'])
dt3 = pd.Series([1,2,3,4,5,6,7])
df = pd.DataFrame({"A":dt.values, "B":dt1.values, "C":dt3.values})

m = pd.concat([pd.to_datetime(df['A'], errors='coerce'),
               pd.to_datetime(df['B'], errors='coerce')], axis=1).isna().all(axis=1)

print(df[~m])

Prints:

                            A                           B  C
1  2020-06-01 12:17:51.320286  2020-06-01 12:17:51.320826  2
6                  2020-05-05                  2020-05-25  7

Upvotes: 0

Related Questions