Reputation: 1168
enter code here
I have the following Pandas DF:
col1 col2
0 "text 0" "1979.10.01."
1 "text 1" "1989.11.02."
2 "text 2" "2001.13.03."
3 "text 3" "2012.11.04."
4 "text 4" "1999.01.04."
Now I want to get rows where col2
don't fit to the YYYY.MM.DD.
pattern.
Bonus question: how to check additionally not just the pattern, but validity of date as well (in case of row 0 13rd month is invalid, for example).
Upvotes: 1
Views: 44
Reputation: 863751
You can test numeric pattern 4digits.2digits.2digits
in Series.str.contains
:
df1 = df[~df['col2'].str.contains('\d{4}\.\d{2}\.\d{2}\.')]
For second is possible try convert to datetime in YYYY.MM.DD.
and if failed is generate missing values, so last filter them:
df2 = df[pd.to_datetime(df['col2'], format='%Y.%m.%d.', errors='coerce').isna()]
print (df2)
col1 col2
2 text 2 2001.13.03.
Details:
print (pd.to_datetime(df['col2'], format='%Y.%m.%d.', errors='coerce'))
0 1979-10-01
1 1989-11-02
2 NaT
3 2012-11-04
4 1999-01-04
Name: col2, dtype: datetime64[ns]
Upvotes: 2