Reputation: 39
I have a dataframe having multiple columns. One of the column is having dates of format (%m/%d/%Y)
or having null values. I have to apply a check to make sure that date column contains date in correct format (mentioned above).
What I am trying to do is:
pd.to_datetime(df['DOB'], format='%m/%d/%Y', errors='coerce').all(skipna=True)
to check it has correct date format and empty values can be ignored, but I am getting this error,
TypeError: invalid_op() got an unexpected keyword argument 'skipna'
So, kindly let me know how to do it or what other logic I can apply ?
EDIT 1: Suppose data having 3 DOBs and 1 null value:
data = {"Name": ["James", "Alice", "Phil", "Jacob"],
"DOB": ["07-01-1997", "06-02-1995", "", "03-07-2002"]}
Modifying DOB column to convert date as per my format and replacing empty fields with NaN:
df['DOB']=pd.to_datetime(df['DOB']).apply(lambda cell: cell.strftime(DATE_IN_MDY) if not pd.isnull(cell) else np.nan)
And in this case I want result to be true.
Upvotes: 2
Views: 6236
Reputation: 862671
Idea is compare for empty strings OR (|
) for missing values by Series.isna
and then compare by possible added misisng values by parameter errors='coerce'
in to_datetime
:
data = {"Name": ["James", "Alice", "Phil", "Jacob"],
"DOB": ["07-01-1997", "06-02-1995", "", "03-07-2002"]}
df = pd.DataFrame(data)
m1 = df['DOB'].eq('') | df['DOB'].isna()
m2 = pd.to_datetime(df['DOB'], errors='coerce').isna()
print (m1.eq(m2).all())
True
Sample for return False
, because wrong datetime:
data = {"Name": ["James", "Alice", "Phil", "Jacob"],
"DOB": ["07-01-1997", "06-02-1995", "", "03-97-2002"]}
df = pd.DataFrame(data)
m1 = df['DOB'].eq('') | df['DOB'].isna()
m2 = pd.to_datetime(df['DOB'], errors='coerce').isna()
print (m1.eq(m2).all())
False
Upvotes: 2