Hendrik
Hendrik

Reputation: 1168

How to find rows in Pandas DF where string in specific column doesn't fit to pattern?

enter code hereI have the following Pandas DF:

  col1     col2

0 "text 0" "1979.10.01."
1 "text 1" "1989.11.02."
2 "text 2" "2001.13.03."
3 "text 3" "2012.11.04."
4 "text 4" "1999.01.04."

Now I want to get rows where col2 don't fit to the YYYY.MM.DD. pattern. Bonus question: how to check additionally not just the pattern, but validity of date as well (in case of row 0 13rd month is invalid, for example).

Upvotes: 1

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 863751

You can test numeric pattern 4digits.2digits.2digits in Series.str.contains:

df1 = df[~df['col2'].str.contains('\d{4}\.\d{2}\.\d{2}\.')]

For second is possible try convert to datetime in YYYY.MM.DD. and if failed is generate missing values, so last filter them:

df2 = df[pd.to_datetime(df['col2'], format='%Y.%m.%d.', errors='coerce').isna()]
print (df2)
     col1         col2
2  text 2  2001.13.03.

Details:

print (pd.to_datetime(df['col2'], format='%Y.%m.%d.', errors='coerce'))
0   1979-10-01
1   1989-11-02
2          NaT
3   2012-11-04
4   1999-01-04
Name: col2, dtype: datetime64[ns]

Upvotes: 2

Related Questions