MRUNAL MUNOT
MRUNAL MUNOT

Reputation: 423

Python ValueError: time data '02-01-2020' does not match format '%d/%m/%y' (match)

I am working on a dataset for machine learning but I have an error for the date that not matching. I am tried both times with different strings in format "%d-%m-%y", "%d/%m/%y" but it is not worked for me. What can I do so that problem will solve. What can I do as dataset dates are in a different format?

df_MR['Date'] = pd.to_datetime(df_MR['Date'], format = "%d-%m-%y")```

ValueError: time data '30/01/20' does not match format '%d-%m-%y' (match)


df_MR['Date'] = pd.to_datetime(df_MR['Date'], format = "%d/%m/%y")```

ValueError: time data '02-01-2020' does not match format '%d/%m/%y' (match)

Upvotes: 1

Views: 5820

Answers (2)

Masklinn
Masklinn

Reputation: 42247

What can I do as in dataset dates are in different format ?

  1. fix the data source so that it returns coherent data
  2. add an intermediate normalisation pass to your pipeline to handle this
  3. or try both formats in sequence e.g.
try: # try to parse 4 digit years
    df_MR['Date'] = pd.to_datetime(df_MR['Date'], format = "%d-%m-%Y")
except ValueError: # fallback to 2 digits year
    df_MR['Date'] = pd.to_datetime(df_MR['Date'], format = "%d/%m/%y")

One more alternative is to not pass in a format at all, and hope that pandas will get it right. Since both your date formats aren in DMY order, you could try pd.to_datetime(dt, dayfirst=True).

Upvotes: 0

ApplePie
ApplePie

Reputation: 8942

I've had some success using the infer_datetime_format argument of to_datetime in a small example:

>>> df = pd.DataFrame({'a': ['02-01-2020', '03-02-20', '03/02/2020', '04/05/2020']})
>>> pd.to_datetime(df['a'], infer_datetime_format=True)
0   2020-02-01
1   2020-03-02
2   2020-03-02
3   2020-04-05
Name: a, dtype: datetime64[ns]

Upvotes: 5

Related Questions