the man
the man

Reputation: 1391

Dealing with different date formats python?

I'm using the pd.to_datetime(df['Date'], format="%d/%m/%Y") method to try and convert the column to a datetime column.

Now, the dates are of the form day/month/year, but in varying different formats, for example:

5/8/2005
13/08/05
4/2/2006

I get an error: time data '13/08/05' does not match format '%d/%m/%Y' (match) when I try and use pd.to_datetime(), and when I remove the formatting, pandas incorrectly transforms the dates.

How do I deal with this?

Upvotes: 1

Views: 65

Answers (4)

Mike Williamson
Mike Williamson

Reputation: 3158

You cannot directly handle that. You can see details on datetime formatting and notice that there is a %y and %Y directive, meaning 2-digit and 4-digit year, respectively. (I did not know about dayfirst, but I think the apply use case is still helpful for others.)

I would suggest creating a small function and then using that function on a column apply. Something like this:

>>> def convertYear(val):
...     (day, month, year) = val.split('/')
...     if len(year) == 2:
...         if int(year) > 40:
...             year = '19' + year
...         else:
...             year = '20' + year
...         newvals = [day, month, year]
...         return '/'.join(newvals)
...     return val

Then you can call this function on the column of interest; something like:

df.Date.apply(convertYear, axis='index') # don't use axis=0, not readability-friendly

Now you can finally finish with:

pd.to_datetime(df['Date'], format="%d/%m/%Y"

Here Be Dragons

Be warned: even though my function tries to be clever, it is assuming that you have European dates (DD/MM/YYYY), I assume this because of 13/08/05. In the US, it is (MM/DD/YYYY).

Because of this annoyance, proper datetimes should be internationalized, which is really just the East Asian style. (Maybe all of Asia??)

Upvotes: 0

Anant Kumar
Anant Kumar

Reputation: 641

I was stuck in a similar problem once, where there were certain cases with a combination of "Month First" and "Day First". I tackled the problem using the following code.

df=pd.DataFrame({"A":["5/8/2005","13/08/05","4/2/2006"]})
df.loc[:,"IsMonth"]=df.loc[:,"A"].apply(lambda x: int(x.split("/")[0])>12)
df_1=df[df.loc[:,"IsMonth"]==True]
df_2=df[~df.index.isin(df_1.index)]

df_1["A"]=pd.to_datetime(df_1["A"])
df_2["A"]=pd.to_datetime(df_2["A"])

df=pd.concat([df_1,df_2])
df.drop("IsMonth", inplace=True, axis=1)

Please note that the default behavior of pd.to_datetime() is to use dayfirst=False

Upvotes: 0

Kuldip Chaudhari
Kuldip Chaudhari

Reputation: 1112

Try this

pd.to_datetime(df['Date'], dayfirst=True)

Upvotes: 0

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

Use dayfirst keyword in pd.to_datetime:

>>> pd.to_datetime(df['Date'], dayfirst=True)
0   2005-08-05
1   2005-08-13
2   2006-02-04
dtype: datetime64[ns]

Upvotes: 2

Related Questions